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NUCLEIC-ACID PROGRAMMABLE PROTEIN ARRAYS 

BACKGROUND OF THE INVENTION 
The swift pace of discovery of new gene products by genomics and proteomics 
efforts and the growing availability of vast repositories of genes necessitates a strategy 
for analyzing proteins in a high-throughput manner. The high-density array format lends 
itself well to ordered high-throughput experimentation and analysis and has therefore 
become an established and widely-used format for high-throughput analysis of nucleic 
acids. Nucleic acid microarrays have enabled researchers to compare the expression of 
£ thousands of genes simultaneously. By making such comparisons, the expression 
S patterns of clusters of genes in a particular biological context can be rapidly identified, 
IS which in turn can indicate groups of proteins that may act in concert in a specific pathway 
"J or process. 

fC Reports of the analysis of protein function on a large scale are only just emerging. 

Q For example, a large-scale analysis of gene function in S. cerevisiae has been performed 
ft using a transposon-tagging strategy for the genome-wide characterization of disruption 
FU phenotypes, gene expression, and protein localization (Ross-Macdonald et al, Nature 
f [j 402: 413-418, 1999). In addition, complete two-hybrid analysis has been done using a 
large matrix of proteins for the interaction mapping of C. elegans proteins involved in 
vulval development (Walhout et al (2000) Science 287:1 16-122) and the S. cerevisiae 
genome (Uetz et al (2000) Nature 403, 623-631); and Schwikowski (2000) Nature 
Biotech. 18:1257). 

The concept of nonliving peptide and protein arrays has drawn considerable 
attention because this approach to high-throughput experimentation allows the direct 
analysis of discrete protein binding and enzymatic activities without the complications of 
adverse in vivo effects. For example, a low-density (96 well format) protein array has 
been developed in which proteins, spotted onto a nitrocellulose membrane and 
biomolecular interactions, were visualized by autoradiography Ge, H. ((2000) Nucleic 
Acids Res. 28:e3, 1- VII). In another example, a high-density protein array (100,000 
samples within 222 X 222 mm) that was used for antibody screening was formed by 
spotting proteins onto polyvinylidene difluoride (PVDF) (Lueking et al. (1999) Anal. 
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Biodiem. 270:103-1 1 1). Proteins have been printed on a flat glass plate that contained 
wells formed by an enclosing hydrophobic Teflon mask, and the arrayed antigens were 
detected using enzyme-linked immunosorbent assay (ELISA) techniques (Mendoza et al. 
(1999) Biotechniques 27:778-788.)- A large-scale in vitro analysis of biochemical 
activity using affinity-purified yeast proteins has been performed in the context of an 
array of 6144 yeast strains, each bearing a plasmid expressing a different GST-ORF 
fusion (Martzentf «/. (1999) Science 286, 1153-1155). Proteins have been covalently 
linked to chemically derivatized flat glass slides in a high-density array (1600 spots per 
square centimeter), and protein-protein and protein-small molecule interactions were 
detected by fluorescence or radioactive decay (MacBeath and Schreiber (2000) Science 
2 289:1760-1763). DeWildtera/. generated a high-density array of 18,342 bacterial 
| clones, each expressing a different single-chain antibody, for screening antibody-antigen 
l 0 interactions (De Wildt et al. (2000) Nature Biotech. 1 8:989-994). 
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}* SUMMARY OF THE INVENTION 

The inventors have discovered, among other things, that arrays of polypeptides 
can be generated by translation of nucleic acid sequences encoding the polypeptides at 
individual addresses on the array. This allows for the rapid and versatile development of 
a polypeptide microarray platform for analyzing and manipulating biological information. 

In one aspect, the invention features an array including a substrate having a 
plurality of addresses. Each address of the plurality includes: (1) a nucleic acid (e.g., a 
DNA or an RNA) encoding a hybrid amino acid sequence which includes a test amino 
acid sequence and an affinity tag; and, optionally, (2) a binding agent that recognizes the 
affinity tag. Optionally, each address of the plurality also includes one or both of (i) an 
RNA polymerase; and (ii) a translation effector. 

In a preferred embodiment, each test amino acid sequence in the plurality of 
addresses is unique. For example, a test amino acid sequence can differ from all other 
test amino acid sequence of the plurality by 1 , or more amino acid differences, (e.g., 
about 2, 3, 4, 5, 8, 16, 32, 64 or more differences; and, by way of example, has about 800, 
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256, 128, 64, or 32, 16, 8, 4, or fewer differences). In another preferred embodiment, the 
test amino acid sequence encoded by the nucleic acid at each address of the plurality is 
identical to all other test amino acid sequences in the plurality of addresses. In a 
preferred embodiment, the affinity tag encoded by the nucleic acid at each address of the 
plurality is the same, or substantially identical to all other affinity tags in the plurality of 
addresses. In another preferred embodiment, the nucleic acid at each address of the 
plurality encodes more than one affinity tag. In yet another preferred embodiment, the 
affinity tag encoded by the nucleic acid at an address of the plurality differs from at least 
one other affinity tag in the plurality of addresses. 

In a preferred embodiment, the affinity tag is fused directly to the test amino acid 
sequence, e.g., directly amino-terminal, or directly carboxy-terminal. In another 
preferred embodiment, the affinity tag is separated from the test amino acid by one or 
more linker amino acids, e.g., 1, 2, 3, 4, 5, 6, 8, 10, 12, 20, 30 or more amino acids, 
preferably about 1 to 20, or about 3 to 12 amino acids. The linker amino acids can 
- include a cleavage site, flexible amino acids (e.g., glycine, alanine, or serine, preferably 
g glycine), and/or polar amino acids. The linker and affinity tag can be amino-terminal or 
flJ carboxy-terminal to the test amino acid sequence. 

O The nucleic acid can be a RNA, or a DNA (e.g., a single-stranded DNA, or a 

r[i double stranded DNA). In a preferred embodiment, the nucleic acid includes a plasmid 
DNA or a fragment thereof; an amplification product (e.g., a product generated by RCA, 

PCR, NASBA); or a synthetic DNA. 

The nucleic acid can further include one or more of: a transcription promoter; a 
transcription regulatory sequence; a untranslated leader sequence; a sequence encoding a 
cleavage site; a recombination site; a 3' untranslated sequence; a transcriptional 
terminator; and an internal ribosome entry site. In one embodiment, the nucleic acid 
sequence includes a plurality of cistrons (also termed "open reading frames"), e.g., the 
sequence is dicistronic or polycistronic. In another embodiment, the nucleic acid also 
includes a sequence encoding a reporter protein, e.g., a protein whose abundance can be 
quantitated and can provide an indication of the quantity of test polypeptide fixed to the 
plate. The reporter protein can be attached to the test polypeptide, e.g., covalently 
attached, e.g., attached as a translation* fusion. The reporter protein can be an enzyme, 
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e g p.galactos.dase, chloramphenicol acetyl transferase, p-glucuronidase, and so forth. 
The reporter protein ean produce or modulate light, e.g, a fluorescent protein (e.g., ^en 
fluorescent protein, vanants thereof, red fluorescent protein, variants thereof, and the 
like), and luciferase. 

The transcription promoter can be a prokaryotic promoter, a eukaryotrc promoter, 
or a viral promote, In a prefer embodiment, the promoter is the T7 PKA polymerase 
promoter. The regulatory components, e.g., the transcription promoter, can vary among 
nucleic acids a. different addresses of the plurality. For example, different promoters can 
be nsed to vary the amount of polypeptide produced a. different addresses. 
, . In one embodiment, the nucleic acid also includes at least one she for 

O recombination, e.g., homologous recombination or site-specific recombinafon, e.g., a 
1?1 la mbda at. site or variant thereof; a In site; or a FLP site. In a preferred embodrment, 
l ! , he recombination site lacks stop codons in the reading frame of a nucleic acid encodmg a 
W test «id sequence. In another preferred embodiment, the recombinafon she 
r " incudes a stop codon in the reading frame of a nuc.eic acid encoding a test ammo acd 

I. another embodiment, the nucleic acid includes a sequence encodmg a cleavage 
site e g, a protease site, e.g., a site c.eaved by a site-specific protease (e.g., a thrombm 
s „e', an enterokinase site, a PreScission site, a factor Xa site, or a TEV site), or a chermcai 
cleavage site (e.g„ a methionine, preferably a unique methionine (cleavage by cyanogen 
bromide) or a proline (cleavage by formic acid)). 

The nuclerc acid can include a sequence encoding a second polypeptide tag tn 
addition to the affinity tag. The second tag can be C-.ermina. to the test amino acd 
sequence and the affinity tag can be N-«erminal ,0 the test amino acid sequence; the 
second tag can be N-termina. to the test amino acid sequence, and the affinity tag can be 
C-termmal to the tea, amino acid sequence; the second tag and the affinity tag can be 
adjacent to one another, or separated by a linker sequence, bom bemg N-termmal or C- 
.erminal to the test amino acid sequence. In one embodiment, the second tag > an 
additional affinity tag, e.g., the same or different from the first tag. In another 
embodiment, the second tag is a recogmtion tag. For example, the recognhion tag can 
report the presence and/or amount of test polypeptide a. an address. Preferably the 



5 

\i, sequence. 

ru 
ru 
o 
ru 



4 



Attorne; 



!tNo: 00246-260001/H1803 



coding „«*ic acid science, e.g., one that is synthetic* inserted, and aUows to 
intel „ Th ei„,einoan„ein S e rt edin t „en, 1 dd,eo f a t es,a m i„oac 1 dse q uence. Themtem 

— — :::: 

■ n. „™ The encoding nncleic acids can be nncle.c acids (e.g., an mRN A or 
a genomrc horary. The encodmg ^ ^ ( . £ _ 

cDNA) expressed in a tissue, e.g., a normal 

peptides are random amino acid sequences, patterned ammo acrds sequences, 
d Xdami„oac 1 dssec 1 nences(e.g.,se,„encedesigneohyma„na,, ra «onanor 

Z ty torn a fL sonrce, and ptoahty from a second somce. For exampie, the test 
firs, species, whereas the sciences on the remaining haif are from a norma, 

* cond ::p;Led— ^ 

.rorocaneneodeap^tyoftests^encea. For examp.e, each * re to 
ptol Hy can encode a pool of test po.ypepnde se q „ences, e.g., a snhset of a hhrary 
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done baric. A second array can be provided in which each address of .he plurality of ft. 
second array includes a single or subse, of members of the pool present a. an address of 
the firs, array. The firs, and die second array can be used eonsecuhvely. 

In other preferred embodiments, each address of the plurality further includes a 
second nucleic acid encoding a second amino acid sequence. 

ht one prefer embodiment, each address of me plurality inc.ndes a firs, test 
amino acid sequence that is common to all addresses of the plurality, and a second test 
amino acid sequence that is unique among aU the addresses of me plnra.i«y. For example, 
«he second .es. amino acid sequences can be qnery sequences whereas the firs, ammo test 
u amino acid seqnence can be a targe, seqnenee. In another preferred embodiment, each 
O address of me plnrality includes a firs, test amino acid sequence .ha. is unique among all 
Pi th e addresses of the plurality, and a second test amino acid seqnence that is common to 
l 2 a addresses of the plurality. For example, the firs, test amino acid sequences can be 
W query sequences whereas me second amino ,es, amino acid sequence can be a ,arge, 
™ sequence. The second nncleic acid encoding the second tea, amino acid sequence can 
P include a sequence encoding a recognition teg and/or an affinity tag. 
r U At at leas, one address of the plurality, the firs, and second amino acid sequences 

B can be such that they interact with one another, ht one preferred embodiment, dtey are 
ft capable of binding .0 each other. The second test amino acid seqnence is opttonally 

fhsed to a de.ec.ab.e amino acid sequence, e.g., an epitope tag, an enzyme, a flnorescen. 
protein (e.g., GFP, BFP, variants thereof). The second test annuo acid sequence can be 
itself detectable (e.g., an antibony is available which specifically recognizes ,.). In 
another preferred embodiment, one is capable of modifying the other (e.g., maktng or 
breaking a bond, preferably a covalen. bond, of me other). For example, .he firs, ammo 
acid seqnence is kinase capable of phosphorating the second amino acid sequence; the 
firs, is a methylase capable of me.hyla.ing the second; the firs, is a ubiqnmn hgase 
capable of ubiquitinating the second; the firs, is a pro.ease capable of cleavmg me 

second; and so forth. 

These embodiments can be used to identify an interaction or to identify a 
compound that modulates, e.g., inhibits or enhances, an interaction. 



6 



ey^^et 



Attomcy^Pet No: 00246-260001 /HI 803 



in 

B SI 

u 
ru 



The binding agen, can be attached .0 .he substrate. For example, the substrate can 
be denized and the binding agent covaien, attached thereto. The binding agen. can be 
attached via a bridging moiety, e.g., a specific binding pair, (e.g., the substrate con.ams a 
firs, member of a specific binding pair, and me binding agen. is linked .0 me second 
member of me binding pair, .he second member being attached .0 .he snbstra.e). 

In yet another embodiment, an insoluble substrate (e.g., a bead or part.de), .s 
disposed at each address of the plurality, and .he binding agen. is attached to me 
insoluble substrate. The insoluble substtatc can further contain information encodmg » 
identity eg., a reference .0 .he address on which i. is drsposed. The insoluble substta.e 
can be tagged using a chemical tag, or an electronic tag (e.g., a transponder). The 
:: insoluble substta.e can be disposed such that it can be removed for later analy,, 
In Also featured is a database, e.g., in computer memory or a computer readable 

! medium. Each record of me database can inc.ude a field for ttre amino acid sequence 

encoded by .he nucleic acid sequence and a descriptor or reference for me phys.ca. 
. .ocationofthenucleicacidsequenceonthearray. Optionally, .he record also inc.udes a 
S fieldreprese„,mgaresul.(e.g.,aqualita.iveorqnan.te.iveresnl.)ofde,ec.,ng«,e 
W po.ypep.ide encoded by .he nucleic acid sequence. The da«abase can include a record for 
I eachaddressofmepluralitypresentonthearray. The records can be clustered or have a 
rU reference .0 Cher records (e.g., including hierarchical groupings) based on .he result 
to another aspect, me invention features an array including a substtale havmg a 
plurality of addresses. Each address of the plurality includes: (1) an RNA encodmg a 
hybrid amino acid sequence comprising a .es. amino acid sequence and an affintty .ag; 
and (2) a binding agen. mat recognizes .he affinity .ag. Op.iona.ly, each address of me 
plurality also includes one or both of (i) a transcription effector; and (ii) a ttans.atton 
effector. 

In a preferred embodiment, each tes. amino acid sequence in the plurahty of 
addresses is unique. For example, a .es. amino acid sequence can differ from all o.her 
test amino acid sequence of the plurality by 1, or more ammo ac.d d.fferences, (e.g., 
about 2, 3, 4, 5, 8, 16, 32, 64 or more differences; and, by way of example, has about 800, 
256 128, 64, or 32, 16, 8, 4, or fewer differences). In another preferred embodiment, the 
tes.'amino acid sequence encoded by me nucleic acid a. each address of me plurahty ,s 
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identic* to aH Cher test amino acid sequences in the plurality of addresses. In a 

ddresse, In another preferred embodiment, the nuclete aeid at eae add e s of the 
ptaamy encodes more than one affinity tag. or ye, another preferred embodiment , 

one other affinity tag in the plurality of addresses. 

to a preferred enrbodtnren, the affinity tag is fused direcUy to the test anttno actd 

sequence, e.g., directly amino— or directly carboxy-termina.. In anoth. 
, , pr ferred embodiment, fire affinity tag is separated ftont fine test anttno actd hy one 
B L re HltKer amino acids, e.g., 1,2,3,4,5,6,8, ,0, n, 20, 30 or mom ammo actds, 

f ?! preferably about 1 to 20, or about 3 to 12 antino acids. The linker anttno acds can 

preferably glyctae , alanine, or serine, preferably 

*.! include a cleavage site, flexible amino actds (e.g., giy , m , in ,, or 
E g ,yeine), and/or po.ar amtno acids. The tinker and affint.y tag can be amino-.ermtna! 
earboxy-terminal to the test amino aeid sequence. 

The nuc.eic aeid can further inc.ude one or more of: a untreated leader 

K sequence; and an internal ribosome entry site, m one embodiment, me nuc etc actd 

late The reporter protein can be attached to the test polypeptide, e.g., covalently 
e ,, p-galactosidase, cmoramphenico, acetyl ttansferase, p-gmcurorudase, an so 

fluorescent prorem, variants thereof, red fluorescent protein, variants thereof, and the 

like), and luciferase. 

to one embodiment, the nucleic aeid also inc.udes a. leas, one stte for 
.combination, e.g., homologous recombination or site-specific mcombin attorney 
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the recombination site lacks stop codons in the reading frame of a nucleic acid encodmg a 
test amino acid sequence. In another preferred embodiment, the recombinaUon sUe 

sequence. 

In another embodiment, the nueleie add includes a sequence eneodmg a eleavage 
site e g., a protease site, e.g., a site eleaved hy a site-specifie protease (e.g., a thrombm 
* an enterohhrase she, a PreScission she, a factor Xa she, or a TEV she), or a ehemtea, 
cleavage she (e.g., a methionine, preferably a unique methionine (eleavage by cyanogen 
bromide) or a proline (cleavage by foimic acid)). 

The nucleic acid can include a sequence encoding a second polypeptide tag m 
.. addition to the affinity tag. The second tag can be C-temnna. to the test amino actd 
g sequenee and the affinity tag can be N-.erminal to the test amino acid sequence; the 
j second tag can be N-termina , .0 the test amino acid sequence, and the affinity tag can be 
t c-temrina. ,0 the test amino acid sequence; the second tag and the affinity tag can be 
rU adjacent .0 one another, or separated by a (inker sequence, both being N-temrinal or C- 
P terminal tome test amino acid sequence. In one embodiment, the second tag » an 
K additional affinity tag, e.g., the same or different from the first tag. In another 
" embodiment, the second tag is a recognition tag. For example, the recognition tag can 
report the presence and/or amount of test polypeptide a, an address. Preferably the 
recognition rag has a sequence other man the sequence of the affinity tag. * still I another 
embodiment, a p.uraUty of polypeptide tags (e.g„ less than 3, 4, 5, about 10, or about 20 
tag s) are encoded in addition .0 the firs, affinity tag. Each polypeptide ,ag of me plurahty 
can be the same as or different from the firs, affinity tag. 

The nucleic acid sequence can further include an identifier sequence, e.g., a non- 
coding nucleic acid sequence, e.g., one that is synthetically inserted , and allows for 
uniquely notifying the nuc.eic acid sequence. The identifier sequence can e suffice* 
in tengtit .0 uniquely identify each sequence tn the plurality; e.g„ h is about 5 to 00, 0 
,0 .00 10 to 50, or about 10 to 30 nucleotides in length. The identifier can be selected so 
tha, i, is no. complementary or identical .0 another identifier or any regron of each 
nucleic acid sequence of the plurality on the array. 
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The .es, amino acid sequence can tether include a protein sp.icing sequence or 
intern. The intein can he inserted in the middle ofa.es. amino acid sequence. The m.etn 
can be a naturally-occurring intein or a mutated intein. 

The nucleic actds encoding the .es. amino acid sequences can he obtained from a 
collection of full-lens* expressed genes (e.g., a repository of clones), a cDNA hbrary, or 
a genomic library. The encoding nucleic acids can be nudeic acids (e.g., an mRNA or 
cDNA) expressed in a .issue, e.g., a normal or diseased .issue. The <es« polypepudes (,e„ 
test amino actd sequences) can be mmams or variants of a scaffo.d pro.ein (e.g an 
antibody, zinc-finger, polypeptide hormone ere). In ye. another embodiment me .es. 
po.ypep.ides are random amino acid sequences, patterned amino acids sequences, or 
designed amino acids sequences (e.g., sequence designed by manual, rational, or 
computer-aided approaches). The pluratiiy of .es, amino acid sequences can mcU.de a 
ptaral Uy from a firs, source, and plurality from a second source. For example, me .es. 
amino acid sequences onhalf .he addresses of an array are from a diseased ussue or a 
firs, species, whereas the sequences on me remaining half are from a norma! ussue or a 
second species. 

to a preferred embodiment each address of .he plurality further mcludes one or 
m0 re second nucleic acids, eg., a plurality of unique nucleic acid, Hence, me pluralrty 
in ,o,o can encode a pluratity of.es, sequence, For example, each address of me 
p,ura.i,y can encode apoo, of m polypeptide sequences, e.g., a subse, of a horary or 
l„e bank. A second artay can be provided in which each address of me plurahfy of ore 
second array includes a single or subse, of members of .he poo, present a, an address of 
the firs, artay. The first and me second array can be used consecuttvely. 

te other preferred embodiments, each address of the pluralily further mcludes a 
second nucleic acid encoding a second amino acid sequence. 

In one preferred embodrmen., each address of .he plurality includes a firs, tea, 
amino acid sequence ma, is common ,o all addresses of me plurality, and a second ,es, 

,he second tea, amino acid sequences can be query sequences whereas me firs, 
amino acid sequence can be a large, sequence. In another preferred embodiment, each 
address of the plurality includes a firs, test amino acid sequence ma, is unique among all 



10 



Attorne; 



No: 00246-260001/H1803 



5, 

rJ 

3 

n 

fi 



q uery sequences whereas the second amino test amino aeid sequence can be a target 
ZceTheseoondnuc.eieaciden^meseco.d^aminoac.dse^u^ea, 

include a sequence eneoding a recognition tag and/or an affimty tag. 

LueastoneaddressoHneptnraHty^eftrstandsecondammoactds^ences 

Jed to a detect amino acid science, e.g., an epitope tag, an enz^e, a fluo ^ 
I InX^GFP.BFP.varian^ereof). The second test ammo actd seque e«toe 

C,f deJ-ab.e (e g , an antibody is avaiiable which specifically recogmses ,.). b 

eldime„,,one 1 scapab 1 eo fm od^ng t heomer(e,., m a,n g or 

-: "bond.preferablyacova.entbond.oftheother,. Pot example the firs, ammo 

U 2 methylase capable of methylating ,he second; me firs, is a ubiqtuto bgase 
g !;: i I„ f ubUafingmesecond,hefirs,isapro,easecapab,eo f c,eavmgme 

rU second; and so forth. . , ,r fv „ 

These embodiments can be used ,o identify an interaCton or ,o tdentrfy a 
J .k.. m M„late S es inhibits or enhances, an interaction. 
^C^rbe^hedfofhe— . FoI e*amp,e, me — can 

2 member of a specific binding pair, and ,he binding agen, is United ,0 the second 

The inso,ub.e substrate can tourer contain information encodmg tts tdentuy, e* a 
l ie ,0 ,he address on which i, is disposed. The inso.ub.. subs,rate can be . g d 

car, be disposed such mat it can be removed for later analysts. 
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* s«il, another aspect, the invention features an », ineludrng a substrate havmg 
ap,ura,i,yofaddresses. Each address of me p,ura,i,y includes: (!) a po,ypep„de 

agent. The binding ageu, is optiruany capabie of attaching to the affrnrty tag of «he 
iypepude. Optional!* each address of the pluraluy aisu iuCudes a UausiaUon effector 
and/or a transcription effector. 

to a preferrec embodimeu,, each test amino acid sequence in the plural, of 

,es« amino acid seuuence of the plurahty by 1, or more amino acid differences, (eg 
„ 3 4 5 8 16 32 64 or more differences; and, by way of example, has about 800, 

.esr'amino acid sequence of me polypeptide is identical to aU other test ammo acrd 

poUpdde a. each address of the ptaratoy is the same, or substan.ta.ly rdentrcal to a,, 

other affinity tags in the plurality of addresses. 

to a preferred embodiment, the polypeptide has more than one affmrty tag. In 
auother embodimeu,, the polypeptide of an address has an affinity lag that differs from at 
testoneotheraffmitytagofapolypeptideinthepluralityofaddresaes. 

to a preferred embodiment, the affinity tag is msed direct.y .0 me test ammo acrd 
sequence, e.g., directly amino—, or direcUy carboxy-termina,. In another 
preferred embodimeu,, me affinity «ag is separa,ed from ,he ,es, ammo actd by o e 
more linker amino acids, e.g., 2, 3, 4, 5, 6, 8, .0, 12, 20, 30 or more 
preferably abou, 1 ,o 20, or abou, 3 to 12 amino acids. The linker amino acrds can 

gl yoine), and/or po,ar amino acids. The linker and affinity ,ag can be ammo— 
carboxy-terminal to me ,es, amino acid sequence. 

to another embodiment, each address of ,he pluraUty further tnctodes a 
acid The nuCeic acid a, each address of the plumhly encodes, he po^eprtde The 

DNA). in a preferred embodimeu,, me nucleic acid includes a plasmtd DNA or 
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fragment thereof; an amplification prodnct (e.g., a product generated by RCA, PGR, 
NASBAV or a synthetic DNA. 

franscription regmatory sequence; a — — — -queuce; a sefruence encodmg a 
eieavage site; a recombination site; a J- unfrans.ated science; a franscnpUonal 

sconce mdudes a pmrahry of cistrons (aiso termed "open readmg frames ), e.g., 
seauence is dicistronic or polycistronic. 

The transcription promoter can be a propone promoter, a eUKaryohc promoter, 
^v.ra.promorer.mapreferredembodimenMhepromoteris.heTVRNApoiyme^ 

Lceie acids a, different addresses of the pluraUty. For example, drfferem promoters car, 
be used to vary .he amonn. of polypeptide prodnced a, different addresses. 

fr, one embodiment, .he nncleie acid also includes a. leas, one sue for 
recombination, e.g., homo,ogous recombination or site-specific re—ion, e*a 
, am bda a« she or variant .hereof; a ,ox site; or a FLP she. In a preferred embodrme* 

te , amino acid sequence, m another prefer embodiment, me recombmahon m 
laesastopcodoninmereadingfiameofanncleicacdencodingates.ammoacd 

^""rhe nncleie acid sequence can firrther include an identifier sequence, e.g. a non- 
ooding nuclerc acid sequence, e.g., one .ha, is synthetical* inserted , and a«ow S for 
I ^.yidentifymgmennCeicacidseqnence. The identifier sequence can .su^. 
ta ,ength to unique* identify each sequence in ft. plurality; e.g., , about 5 to 00 

fta. i. is no. commentary or identiea. to another identifier or any regton of each 
nucleic acid sequence of the plurality on the array. 

ln another embodiment, the pdypepude further includes a reporter protem, e.g 
protei nwho S e — canbequanti^andcanprnvideaninmcationoffte^y 

po.ypeptide, e.g., covalently anached, e.g., attached as a transanal fiosron. 
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repor ,erpro,eincanbe an enzyme, e.g„ p-galacmsidase, chloramphemco, acetyl 
tr a„sferase, p.gmcuronidase, and so forth. The reporter pro.em ean produce or modu,ate 
m e g., a fluorescent protein (e.g., green fluorescent protein, variants thereof, red 
fluorescent protein, variants thereof, and the like), and luciferaae. 

,n another embodiment, the polypeptide includes a cleavage site, e.g., a protease 
site e.g„ a site c.eaved by a site-specific protease (e.g., a thrombin site, an en— 
sitt a mission site, a factor Xa site, or a TEV site), or a chemtca, c.eavage s,«e (e.g., a 
m em,onine,preferablyauni q ueme,mo»ine(c,eavagebyc y a„o g enbr„m,de)„rapm ta 

(cleavage by formic acid)). 

The polypeptide can a,so include a sequence encoding a second polypeptide tag m 
addition to the affinity tag. The second tag can be C-.ermina, to me lest amino actd 
sequence and me affinity tag can be N-terminal to the test amino acid sequence; me 

C-terminal to the test amino acid sequence; the second tag and the affinity tag can be 
adjacent to one another, or separated by a .inker sequence, bom being N,ermmal or C 
terminal to me test amino acid sequence, m one embodiment, the second tag . an 
addittona, affinity tag, e.g., me same or different fiom the first tag. In another 
embodiment, the second tag is a recognition tag. For example, the recogntfon tag can 
report the presence and/or amount of test polypeptide at an address. 

tag s) are encoded in addition ,0 the firs, affinity tag. Each polypeptide tag of the plurahty 

can be flte same as or different from the first affinity tag. 

The test amino acid sequence can further includes a protein splictng sequence or 

mt em. The lutein can be tnserted in flte middle of a test amino acid sequence. The mtem 
can be a naturally-occurring intein or a mutated intein. 

A variety of test amino acid sequences can be disposed a. different addresses of 
the ptarality. For example, the teat amino acid sequences can be peptides expressed 
in a tissue, e.g., a normal or diseased tissue. The test polypeptides can be mutants or 
variants of a scaffold protein (e.g., an antibody, zinc-finger, polypeptide hormone etc.). 
ta ye, another embodimen,, me ,es, P o,ypep,ides are random amino acid sequences, 
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patterned amino acids sequences, or designed amino acids sequences (e.g., sequence 
designed by manual, rational, or computer-aided approaches). The plurality of test ammo 
acid sequences can include a plurality from a first source, and plurality from a second 
source For example, the test amino acid sequences on half the addresses of an array are 
from a diseased tissue or a first species, whereas the sequences on the remaining half are 
from a normal tissue or a second species. 

In apreferred embodiment, each address of the plurality further includes one or 
more second polypeptides. Hence, the plurality, in toto, can encode a plurality of test 
polypeptides. For example, each address of the plurality can include a pool of test 
polypeptide sequences, e.g., a subset of polypeptides encoded by a library or clone bank. 
A second army can be provided in which each address of me p.uralhy of me second array 
includes a single or subset of members of the pool present a, an addressof the firs. army. 
The first and the second array can be used consecutively. 

in other preferred embodiments, each address of the plummy further inclndea a 

second polypeptide. 

In one preferred embodiment, each address of the plurality includes a first test 
amino acid sequence that is common to all addresses of the plurality, and a second test 
amino acid sequence that is unique among all the addresses of the plurality. For example, 
the second test amino acid sequences can be query sequences whereas the first ammo test 
amino acid sequence can be a target sequence. In another preferred embodiment, each 
address of the plurality includes a firs, test amino acid sequence that is unique among all 
the addresses of me plurality, and a second tea. amino acid sequence that is common .0 
all addresses of the plurality. For example, the firs, test amino acid sequences can be 
query sequences whereas the second amino test amino acid sequence can be a target 
sequence. The second test annuo acid sequence can include a region tag and/or an 
affinity tag. 

A. at leas, one address of the plurality, the first and second amino ac.d sequences 
can be such that drey interact with one another, fir one preferred embodiment, they are 
capable of btnding to each other. The second test amino acid sequence is opttonally 
fused <o a detectable amino acid sequence, e.g., an epitope tag, an enzyme, a fluorescent 
protetn (e g., GFP, BFP, variants .hereof). The second test amino acid sequence can be 
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^detectable (e.„ an antibody is — which specially 

atra ched via a bridging moiety, e.g., a specific bmdurg pam 
« me.be, of a specific binding pair, and me binding agen. - hnked t . 

ta yet another embodiment, an insoluble substrate (e.g., a bead 

using a chemical tag, or an electronic tag (e.g., an- i> 
™ ^ b e d,sposed such .hat it can be removed for later analysts. 

Also featured is a database, e.g., in computer memory or a computer readaUe 
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— .sofeatureaamethodofprovdrnganarray. -me..— 

,„ providing a substrate with a plurahty of addresses; and (2) prov,d,„g a. each addres 
^Ziat.eaa.COannc.eicaeidencoding.aminoacidser.uenceoompnsmga 
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a transcription effector, and (it) a translation effeeto, Optionally, the 
Irian — polypeptide, e.g., an unbound H^eor^ 

g or other conrp.ex macron— e.,, comp.ex sogers, hprtis, or matrtx 

w „ an air- or water-resistant package. The array can be defeated, 

^-example, anan.ycanherap^^anerheingoption^con^ 
■ T recant This step can be done at any point in the process (e.g., before or 

, *• ff^tnr- or before or after washing the array). The packaged, proa 

a nation effector, a vector nudeic acid, an antibody, and so for*. 

to a preferred embodiment, each test amino acid sequence m the plnrah y of 

e Forexample a test amino acid sequence can differ from ah other 
addresses ts umque. For example ^ ^ 

tes , amino acid sequence of the P>™>' ^ of examptei has about 80 0, 



w 
ru 

s 

ru 
ru 

£3 



17 



0 Attomej^etNo: 00246-260001 /HI 803 

preferred embodiment, the affinity tag is separated from the am 

T^i-.p"*"*-**-* «• nucleic te,udesaplasm,d 

^ ppp NASBA); or a synthetic DNA. 

M' PCR ' NAbB } ' . * fiirther include on e or more of: a transcription promoter, a 

and an intend — entry s,«e. ta0MOT J 'Leo eg .he 
• , H P ,» D taraUtvofcistrons(alsotenned"openre a dmgftames),e.g.,u. 

attacheo, e.g., ^-a™ B-elucuronidase, and so forth. 

ee B-galactosidase, chloramphenicol acetyl transferase,^! 

e.g., p gaia^i fluorescent protein (e.g., green 

r.r;,:r.rr,,r— 

like), and luciferase. 
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The transcripbon promoter ean be a pro.caryo.ic promoter, a eu.taryo.ic promoter, 

♦ the transcription promoter, can vary among 
„♦«■ The reeulatory components, e.g., the transcnpuun v 

be used ,o vary the amount of po.ypep.ide produced at different addresses. 

to „ n e embodiment, .he nucleic acid a,so inc.udes a. .eas. one s,.e fo, 

^lid sequence, h, ano.be. preferred embodimen, *. recon.bma.ton s,,e 
"as.opcodonin.he.eadins.ameofanu.eicacidencodin.atestam.noac.d 

^ another embod,men, ,e nue.eic acid «»des a sequence encod.n g ac,eava,e 
«tP » PreScission site, a factor Xa site, or a TbV site;, 

, ,h, affinity taa can be N-terminal to me test amino actd sequence, .he 

cl.in.to.he.es.aminoacidscnene.mesccondtas and.be 

Jl. to one anchor, or separated by a .inker sequence, both being N-termtnal or C- 
^Iuome.c,ammoac,d S c q uence.,no„eemb„di m en,,csecond,a g ,san 

u on t a nluralitv of polypeptide tags (e.g., less than 3, 4, 5, aooui 
can be the same as or different from the first affinity tag. 
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The nucleic acid sequence can further include an identifier sequence, e.g., a non- 
coding nucleic acid sequence, e.g., one that is synthetical* inserted, and afiows for 
uniquely identifying the nucleic acid sequence. The identifier sequence can he suffic.en. 
in iength to uniquely identify each sequence in the plurality; e.g., i. is ahou. 5 to 500, 
,„ ,00 10 to 50, or ahou. 10 to 30 nucleotides in length. The identifier can he selected so 
,ha. it is no. complementary or identical .o ano.her identifier or any region of each 
nucleic acid sequence of the plurality on the array. 

The <es< amino acid sequence can further include a protein sptieing sequence or 
intein. The infein can be inserted in the middle of a test amino acid sequence. The mtem 
can be a naturally-occurring intein or a mutated intein. 

The nucleic acid sequences encoding the test amino acid sequences can be 
obtained from a collection of full-length expressed genes (e.g, a repository of clones), a 
cDNA library, or a genomic library. The test amino acid sequences can be genes 
expressed in a tissue, e.g., a normal or diseased tissue. The test polypeptides canbe 
m utan,s or variants of a scaffold protein (e.g„ an antibody, zincfinger, polypeptide 
hormone etc.). In ye. another embodiment, the test polypeptides are random ammo ac.d 
sequences, patterned amino acids sequences, or designed amino acids sequences (e.g., 
sequence designed by manual, rational, or computer-aided approaches). The plural,. y of 
test ammo acid sequences can include a plurafily from a first source, and plurahty from a 
.cond source. For example, tire test amino acid sequences on half the addresses of an 
m * y are from a diseased tissue or a first species, whereas tire sequences on tine remammg 
half are from a normal tissue or a second species. 

In a preferred embodiment, each address of the plurality further includes one or 
m0 re second nucleic acids, e.g., a plurality of unique nucleic acid, Hence, the plurahty 
i„ ,„,o can encode a plurality of test sequences. For example, each address of tire 
ptarality can encode a pool of.es. polypeptide sequences, e.g., a subset of a library or 
Cone bank. A second at,ay can be provided in which each address of me plurality of the 
second array includes a single or subset of members of the poo. present a, an address of 
.he fust array. The first and the second array can be used consecutively. 

,„ other preferred embodiments, each address of .he plurality further includes a 
second nucleic acid encoding a second amino acid sequence. 
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' embodiment, each address of the plurality includes a firs, test 
o0 e *L is eommon to all addresses of the plurahty, and a second test 
W >. hat is unique among a„ the addresses of the p.urahty. For example 
<** actd sequences can be query sequences whereas the firs, ammo test 

^° ^eeanbeararger sequence, In another preferred embodiment, each 
«** lality inciudes a firs, ,es, ammo acid sequence <ha, is unique among all 
<Ce phnalrty, and a second « amino acid sequence ft* is common ,0 
ft. plummy. For example, ,he firs, ,es, amino acid sequences can be 
/ees whereas me second amino ,es, amino acid sequence can be a ,arge, 
/he second nuc.eic acid encoding the second ,es, amino acid sequence can 
sentence encoding a recognition tag and/or an affinuytag. 
it aueas, one address of the pbaraKy, the firs, and second amino actd sequences 
U <ha. «hey in,erae, wi,b one another, fir one preferred embodiment, they are 
L of binding to each other. The second tes, ammo ae.d sequence ,s op,,o^ 
L a detectable amino acid sequence, e.g., an epitope <ag, an enzyme, a fluo esc 
, GFP BFPvarian,surereoO.Thesecond«estaminoactdseqnenoecanbe 

C^eig LtibodyUavailablewhtcbspecificaUy recognizes it). The 

of the p,ura.i,y, e.g., by detecting me detectable amino acd sequence (e.g., me epttope 
tao enzvme or fluorescent protein). 

Lofirer preferred embodiment one is capable of modifying me omer (e.g. 

sequence; the first is a mefirylase capabte of me,hyla,ing ,he second, the firs, a 

rig,h!seco„d; and so forth. The method can firdher include detectmg , be 
modification a, each address of the plurality. 

These embodiments can be used ,o identify an interaction or ,o tdenttfy a 
a tw modulates e g., inhibits or enhances, an interaction. 
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attached via a bridging moiety, e.g., a specific binding pair, (e.g., .he substtam contams a 
firs, member of a specific binding pair, and me binding agen, is .inked .0 .he second 
me mber of .he binding pair, the second member being attached .0 .he substrate . 

In ye, anolher embodiment, an insolnble snbsttate (e.g., a bead or part.cle), ts 
disposed at each address of me plnrahty, m d the binding agen. is attached to the 
insol „ble snbstrate. The inso.nbie snbstrate can further contain infonnation encodmg 
identity, e.g., a reference .0 the address on which i, is disposed. The mSoluMe subsfrafe 
can be tagged nsing a chemical tag, or an electronic tag (e.g., a transponder). The 
insolnble snbstrate can be disposed snch that 1. can be removed for .afer analysts. 

The method can further inclnde providing a database, e.g., in compn.er memory 
„ r a compnter readable medium. Each record of the database can inclnde a field for the 
amino acid sequence encoded by the nucleic acid sequence and a descriptor or reference 
for the physical location of .he nucleic acid sequence on me array. The database can 
include a record for each address of the plnrahty present on the array. Optionally the 
m e,hod includes entering into tine record also inchtdes a field representing a result (e.g., a 
qualitative or quantitative resnh) of detecting the polypeptide encoded by the nnc etc actd 
sequence. The method can a.so firmer include clustering or grouping the records based 

on the result. , A 

The invention also features a method of providing an army to a user. The method 
incudes providing the user with a snbsttate having a plurality of addresses and a vector 
nucleic acid. The vector nuc.eic acid can include one or more sites for — of a test 
amino acid sequence (e.g., a recombination site or a restriction site), and a sequence 
encoding an affinity tag. m a preferred embodiment, the vector nucleic acdhas too , « 
for insertion, and a toxic gene inserted between the two she, In another embodunent, the 
sites for insertion are homologous recombination or site-specific recombination at.es, 
eg a lambda att site or variant thereof; a lox site; or a FLP site, fnapreferred 
embodiment, one or bom recombination sites lack stop codons in the reading frame of a 
nucleic acid encoding a fest amino acid sequence, fit another preferred embo .men, one 
or both recombination sites include a sfop codon in the reading frame of a nueletc actd 
encoding a test amino acid sequence. 
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• ^eouenceeg directly amino-terminal, or directly carboxy-temunal. In 

m^erantinoacids, e.,,1,2,3,4,5, M , ,0, ,2, 20, 30 or more ammo 
^preferably abou, . to 20, or about 3 to ,2 amino acids. The muter aot.no a* - 
"td a cleavage site, flexible antino acids (e.g., glycine, alanine, or serme, prefer* 

STL- - the test antino acid sequence. The cleavage site can be a protease 



W (cleavage by formic acid)). lpact a 

W I a prefer entbodinten, the ntethod includes proving «be user w„h a, leas a 

P , • -rid The second vector nucleic acid can include one or more sues 

second vector nucleic acid, lneseconu vcva 

M ~ errion of a test amino acid sconce (e.g., a recombination site or a restncbon 

Id therein Multiple nucleic acids can be provided, each havmg a umque .eat ammo 
« e g for disposal a, a unique address of me substrate. The method can 

e g , an epitope tag, an enzyme, a fluorescent pro.eut (e.g., GFP, 
" I preferred embodiment each test amino acid sequence m , e plurahty of 

u o ^ a ^8 16 32 64 or more differences, ana, oy 
idenuca, to all other test ammo aeid sequences in the plurality of addresses. 
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The first and/or second vector nucleic acid can further include one or more of: a 
transcription promoter; a transcription regulatory sequence; a untranslated leader 
sequence; a sequence encoding a cleavage site; a recombination site; a 3' untranslated 
sequence; a transcriptional terminator; and an internal ribosome entry site. In one 
embodiment, the nucleic acid sequence includes a plurality of cistrons (also termed "open 
reading frames"), e.g., the sequence is dicistronic or polycistronic. In another 
embodiment, the nucleic acid also includes a sequence encoding a reporter protein, e.g., a 
protein whose abundance can be quantitated and can provide an indication of the quantity 
of test polypeptide fixed to the plate. The reporter protein can be attached to the test 
polypeptide, e.g., covalently attached, e.g., attached as a translational fusion. The 
reporter protein can be an enzyme, e.g., (i-galactosidase, chloramphenicol acetyl 
transferase, p -glucuronidase, and so forth. The reporter protein can produce or modulate 

light, e.g., a fluorescent protein (e.g., green fluorescent protein, variants thereof, red 

fluorescent protein, variants thereof, and the like), and luciferase. 

The transcription promoter can be a prokaryotic promoter, a eukaryotic promoter, 

or a viral promoter. In a preferred embodiment, the promoter is the T7 RNA polymerase 

promoter. 

In a preferred embodiment, the method further includes contacting the vector 
nucleic acid, and optionally the second vector nucleic acid, with a test nucleic acid which 
includes a nucleic acid encoding a test amino acid sequence so as to insert the test amino 
acid sequence into the vector nucleic acid. The test nucleic acid can be flanked, e.g., on 
both ends by a site, e.g., a site compatible with the vector nucleic acid (e.g., having 
sequence for recombination with a sequence in the vector; or having a restriction site 
which leaves an overhang or blunt end such that the overhang or blunt end can be ligated 
into the vector nucleic acid (e.g., the restricted vector nucleic acid)). The contact step 
can include contacting the vector nucleic acid with a recombinase, a ligase, and/or a 
restriction endonuclease. For example, the recombinase can mediate recombination, e.g., 
site-specific recombination or homologous recombination, between a recombination site 
on the test nucleic acid and a recombination sequence on the vector nucleic acid. 

In a preferred embodiment, each address of the plurality has a binding agent 
capable of recognizing the affinity tag. The binding agent can be attached to the 
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substrate. For example, the substrate can be derivatized and the binding agent covalent 
attached thereto. The binding agent can be attached via a bridging moiety, e.g., a specific 
binding pair, (e.g., the substrate contains a first member of a specific binding pair, and the 
binding agent is linked to the second member of the binding pair, the second member 
being attached to the substrate). 

In yet another embodiment, an insoluble substrate (e.g., a bead or particle), is 
disposed at each address of the plurality, and the binding agent is attached to the 
insoluble substrate. The insoluble substrate can further contain information encoding its 
identity, e.g., a reference to the address on which it is disposed. The insoluble substrate 
can be tagged using a chemical tag, or an electronic tag (e.g., a transponder). The 
O insoluble substrate can be disposed such that it can be removed for later analysis. 
IH In a preferred embodiment, the method further includes disposing at an address of 

p the plurality a vector nucleic acid that includes a nucleic acid encoding a test amino acid 
W sequence. This step can be repeated until a vector nucleic acid is disposed at each 
^ address of the plurality. In embodiments using a second vector nucleic acid in addition to 
p! the first, the method can include disposing at each address of the plurality a second vector 
fU nucleic acid encoding a different test amino acid sequence from the first vector nucleic 

ru 

Q acid. 

N In another preferred embodiment, the method further includes disposing at an 

address of the plurality a vector nucleic acid that does not include a nucleic acid encoding 
a test amino acid sequence and concurrently or separately disposing a nucleic acid 
encoding a test amino acid sequence. This step can be repeated until a vector nucleic 
acid is disposed at each address of the plurality. The method can also further including 
contacting each address of the plurality with a recombinase or a ligase. 

The first or second vector nucleic acid can include a sequence encoding a second 
polypeptide tag in addition to the affinity tag. The second tag can be C-terminal to the 
test amino acid sequence and the affinity tag can be N-terminal to the test amino acid 
sequence; the second tag can be N-terminal to the test amino acid sequence, and the 
affinity tag can be C-terminal to the test amino acid sequence; the second tag and the 
affinity tag can be adjacent to one another, or separated by a linker sequence, both being 
N-terminal or C-terminal to the test amino acid sequence. In one embodiment, the 
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second tag is an additional affinity tag, e.g., the same or different from the first tag. In 
another embodiment, the second tag is a recognition tag. For example, the recognition 
tag can report the presence and/or amount of test polypeptide at an address. Preferably 
the recognition tag has a sequence other than the sequence of the affinity tag. In still 
another embodiment, a plurality of polypeptide tags (e.g., less than 3, 4, 5, about 10, or 
about 20 tags) are encoded in addition to the first affinity tag. Each polypeptide tag of 
the plurality can be the same as or different from the first affinity tag. 

The first or second vector nucleic acid sequence can further include a sequence 
encoding a protein splicing sequence or intein. The intein can be inserted in the middle 
of a test amino acid sequence. The intein can be a naturally-occurring intein or a mutated 



intein. 



[f! The nucleic acids encoding the test amino acid sequences can be obtained from a 

J collection of full-length expressed genes (e.g., a repository of clones), a cDNA library, or 
W a genomic library. The encoding nucleic acids can be nucleic acids (e.g., an mRNA or 
[ U cDNA) expressed in a tissue, e.g., a normal or diseased tissue. The test polypeptides (i.e., 
P test amino acid sequences) can be mutants or variants of a scaffold protein (e.g., an 
W antibody, zinc-finger, polypeptide hormone etc.). In yet another embodiment, the test 
S polypeptides are random amino acid sequences, patterned amino acids sequences, or 
rL! designed amino acids sequences (e.g., sequence designed by manual, rational, or 

computer-aided approaches). The plurality of test amino acid sequences can include a 
plurality from a first source, and plurality from a second source. For example, the test 
amino acid sequences on half the addresses of an array are from a diseased tissue or a 
first species, whereas the sequences on the remaining half are from a normal tissue or a 
second species. 

The method can further include detecting the first or the second test amino acid 
sequence at each address of the plurality. 

In another preferred embodiment using a first and a second vector nucleic acid, 
one test amino acid sequence is capable of modifying the other (e.g., making or breaking 
a bond, preferably a covalent bond, of the other). For example, the first amino acid 
sequence is kinase capable of phosphorylating the second amino acid sequence; the first 
is a methylase capable of methylating the second; the first is a ubiquitin ligase capable of 
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ond; the first is a protease capable of cleaving the second; and so 
ubVc^^ /an further include detecting the modification at each address of the 

fox*- 

p\uta^' ^odiments can be used to identify an interaction or to identify a 
^Modulates, e.g., inhibits or enhances, an interaction. 
cottvp oA ^her aspect, the invention features a method of providing an array of 
. The method includes: (1) providing or obtaining a substrate with a 
poVaddresses, each address of the plurality including (i) a nucleic acid encoding 
y(cid sequence comprising a test amino acid sequence and an affinity tag, and 
/ding agent that recognizes the affinity tag; (2) contacting each address of the 
ly with a translation effector to thereby translate the hybrid amino acid sequence; 
%) maintaining the substrate under conditions permissive for the amino acid 
/uence to bind the binding agent. 
/ In one embodiment, the nucleic acid provided on the substrate is synthesized in 
/itu, e.g., by light-directed chemistry. In another embodiment, each address of the 
' plurality is provided with a nucleic acid, e.g., by pipetting, spotting, printing (e.g., with 
pins), piezoelectric delivery, or, e.g., other means of mechanical delivery. In a preferred 
embodiment, the provided nucleic acid is a template nucleic acid, and the method further 
includes amplifying the template, e.g., by PCR, NASBA, or RCA. The method can 
further include transcribing the nucleic acid to produce one or more RNA molecules 
encoding the test amino acid sequence. 

The method can further include washing the substrate, e.g., after sufficient contact 
with a translation effector. The wash step can be repeated, e.g., one or more times, e.g., 
until a translation effector or translation effector component is removed. The wash step 
can remove unbound proteins. The stringency of the wash step can vary, e.g., the salt, 
pH, and buffer composition of the wash buffer can vary. For example, if the translated 
test polypeptide is covalently captured, or captured by an interaction resistant to 
chaotropes (e.g., binding of a 6-histidine motif to Ni 2t NT A), the substrate can be washed 
with a chaotrope, (e.g., guanidinium hydrochloride, or urea). In a subsequent step, the 
chaotrope can itself be washed from the array, and the polypeptides renatured. 
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In one embodiment, the nucleic acid sequence also encodes a cleavage site, e.g., a 
protease site, e.g., between the test amino acid sequence and the affinity tag. The method 
can further include contacting an address of the array with a protease that specifically 
recognizes the site. 

The method can further include contacting the substrate with a second substrate. 
For example, in an embodiment wherein the substrate is a gel, the gel can be contacted 
with a second gel, and the contents of one gel can be transferred to another (e.g., by 
diffusion or electrophoresis). The method can include disrupting the binding between the 
affinity tag and the binding agent or between the binding agent and the substrate prior to 
transfer. 

The method can further include contacting the substrate with living cells, and 
detecting an address wherein a parameter of the cell is altered relative to another address. 

In a preferred embodiment, each test amino acid sequence in the plurality of 
addresses is unique. For example, a test amino acid sequence can differ from all other 
test amino acid sequence of the plurality by 1, or more amino acid differences, (e.g., 
about 2, 3, 4, 5, 8, 16, 32, 64 or more differences; and, by way of example, has about 800, 
256, 128, 64, or 32, 16, 8, 4, or fewer differences). In another preferred embodiment, the 
test amino acid sequence encoded by the nucleic acid at each address of the plurality is 
identical to all other test amino acid sequences in the plurality of addresses. In a 
preferred embodiment, the affinity tag encoded by the nucleic acid at each address of the 
plurality is the same, or substantially identical to all other affinity tags in the plurality of 
addresses. In another preferred embodiment, the nucleic acid at each address of the 
plurality encodes more than one affinity tag. In yet another preferred embodiment, the 
affinity tag encoded by the nucleic acid at an address of the plurality differs from at least 
one other affinity tag in the plurality of addresses. 

In a preferred embodiment, the affinity tag is fused directly to the test amino acid 
sequence, e.g., directly amino-terminal, or directly carboxy-terminal. In another 
preferred embodiment, the affinity tag is separated from the test amino acid by one or 
more linker amino acids, e.g., 1, 2, 3, 4, 5, 6, 8, 10, 12, 20, 30 or more amino acids, 
preferably about 1 to 20, or about 3 to 12 amino acids. The linker amino acids can 
include a cleavage site, flexible amino acids (e.g., glycine, alanine, or serine, preferably 
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glycine), and/or polar amino acids. The linker and affinity tag can be amino-terminal or 
carboxy-terminal to the test amino acid sequence. 

The nucleic acid can further include one or more of: a transcription promoter; a 
transcription regulatory sequence; a untranslated leader sequence; a sequence encoding a 
cleavage site; a recombination site; a 3' untranslated sequence; a transcriptional 
terminator; and an internal ribosome entry site. In one embodiment, the nucleic acid 
sequence includes a plurality of cistrons (also termed "open reading frames"), e.g., the 
sequence is dicistronic or polycistronic. In another embodiment, the nucleic acid also 
includes a sequence encoding a reporter protein, e.g., a protein whose abundance can be 
quantitated and can provide an indication of the quantity of test polypeptide fixed to the 
O plate. The reporter protein can be attached to the test polypeptide, e.g., covalently 
if! attached, e.g., attached as a translational fusion. The reporter protein can be an enzyme, 
Jp e.g., p-galactosidase, chloramphenicol acetyl transferase, p-glucuronidase, and so forth. 
W The reporter protein can produce or modulate light, e.g., a fluorescent protein (e.g., green 
8 fluorescent protein, variants thereof, red fluorescent protein, variants thereof, and the 
like), and luciferase. 

[!= ! The transcription promoter can be a prokaryotic promoter, a eukaryotic promoter, 

0 or a viral promoter. In a preferred embodiment, the promoter is the T7 RNA polymerase 
rU promoter. The regulatory components, e.g., the transcription promoter, can vary among 
nucleic acids at different addresses of the plurality. For example, different promoters can 
be used to vary the amount of polypeptide produced at different addresses. 

In one embodiment, the nucleic acid also includes at least one site for 
recombination, e.g., homologous recombination or site-specific recombination, e.g., a 
lambda att site or variant thereof; a lox site; or a FLP site. In a preferred embodiment, 
the recombination site lacks stop codons in the reading frame of a nucleic acid encoding a 
test amino acid sequence. In another preferred embodiment, the recombination site 
includes a stop codon in the reading frame of a nucleic acid encoding a test amino acid 
sequence. 

In another embodiment, the nucleic acid includes a sequence encoding a cleavage 
site, e.g., a protease site, e.g., a site cleaved by a site-specific protease (e.g., a thrombin 
site, an enterokinase site, a PreScission site, a factor Xa site, or a TEV site), or a chemical 
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cleavage site (e.g., a methionine, preferably a unique methionine (cleavage by cyanogen 
bromide) or a proline (cleavage by formic acid)). 

The nucleic acid can include a sequence encoding a second polypeptide tag in 
addition to the affinity tag. The second tag can be C-terminal to the test amino acid 
sequence and the affinity tag can be N-terminal to the test amino acid sequence; the 
second tag can be N-terminal to the test amino acid sequence, and the affinity tag can be 
C-terminal to the test amino acid sequence; the second tag and the affinity tag can be 
adjacent to one another, or separated by a linker sequence, both being N-terminal or C- 
terminal to the test amino acid sequence. In one embodiment, the second tag is an 
l a additional affinity tag, e.g., the same or different from the first tag. In another 

? embodiment, the second tag is a recognition tag. For example, the recognition tag can 

O 

y! report the presence and/or amount of test polypeptide at an address. Preferably the 

in 

j£ recognition tag has a sequence other than the sequence of the affinity tag. In still another 

j*| embodiment, a plurality of polypeptide tags (e.g., less than 3, 4, 5, about 10, or about 20 

s* tags) are encoded in addition to the first affinity tag. Each polypeptide tag of the plurality 

p, can be the same as or different from the first affinity tag. 

W The nucleic acid sequence can further include an identifier sequence, e.g., a non- 

ru 

0 coding nucleic acid sequence, e.g., one that is synthetically inserted , and allows for 
1 " uniquely identifying the nucleic acid sequence. The identifier sequence can be sufficient 
in length to uniquely identify each sequence in the plurality; e.g., it is about 5 to 500, 10 
to 100, 10 to 50, or about 10 to 30 nucleotides in length. The identifier can be selected so 
that it is not complementary or identical to another identifier or any region of each 
nucleic acid sequence of the plurality on the array. 

The test amino acid sequence can further include a protein splicing sequence or 
intein. The intein can be inserted in the middle of a test amino acid sequence. The intein 
can be a naturally-occurring intein or a mutated intein. 

The nucleic acid sequences encoding the test amino acid sequences can be 
obtained from a collection of full-length expressed genes (e.g., a repository of clones), a 
cDNA library, or a genomic library. The test amino acid sequences can be genes 
expressed in a tissue, e.g., a normal or diseased tissue. The test polypeptides can be 
mutants or variants of a scaffold protein (e.g., an antibody, zinc-finger, polypeptide 
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hormone etc.). In yet another embodiment, the test polypeptides are random amino acid 
sequences, patterned amino acids sequences, or designed amino acids sequences (e.g., 
sequence designed by manual, rational, or computer-aided approaches). The plurality of 
test amino acid sequences can include a plurality from a first source, and plurality from a 
second source. For example, the test amino acid sequences on half the addresses of an 
array are from a diseased tissue or a first species, whereas the sequences on the remaining 
half are from a normal tissue or a second species. 

In a preferred embodiment, each address of the plurality further includes one or 
more second nucleic acids, e.g., a plurality of unique nucleic acids. Hence, the plurality 
P in toto can encode a plurality of test sequences. For example, each address of the 
9 plurality can encode a pool of test polypeptide sequences, e.g., a subset of a library or 
in clone bank. A second array can be provided in which each address of the plurality of the 
J second array includes a single or subset of members of the pool present at an address of 
W the first array. The first and the second array can be used consecutively. 

ru 

g In other preferred embodiments, each address of the plurality further includes a 

second nucleic acid encoding a second amino acid sequence. 
fU In one preferred embodiment, each address of the plurality includes a first test 

ru 

n amino acid sequence that is common to all addresses of the plurality, and a second test 

amino acid sequence that is unique among all the addresses of the plurality. For example, 
the second test amino acid sequences can be query sequences whereas the first amino test 
amino acid sequence can be a target sequence. In another preferred embodiment, each 
address of the plurality includes a first test amino acid sequence that is unique among all 
the addresses of the plurality, and a second test amino acid sequence that is common to 
all addresses of the plurality. For example, the first test amino acid sequences can be 
query sequences whereas the second amino test amino acid sequence can be a target 
sequence. The second nucleic acid encoding the second test amino acid sequence can 
include a sequence encoding a recognition tag and/or an affinity tag. 

At at least one address of the plurality, the first and second amino acid sequences 
can be such that they interact with one another. In one preferred embodiment, they are 
capable of binding to each other. The second test amino acid sequence is optionally 
fused to a detectable amino acid sequence, e.g., an epitope tag, an enzyme, a fluorescent 
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protein (e.g., GFP, BFP, variants thereof). The second test amino acid sequence can be 
itself detectable (e.g., an antibody is available which specifically recognizes it). The 
method can further include detecting the second test amino acid sequence at each address 
of the plurality, e.g., by detecting the detectable amino acid sequence (e.g., the epitope 
tag, enzyme or fluorescent protein). 

In another preferred embodiment, one is capable of modifying the other (e.g., 
making or breaking a bond, preferably a covalent bond, of the other). For example, the 
first amino acid sequence is kinase capable of phosphorylating the second amino acid 
sequence; the first is a methylase capable of methylating the second; the first is a 
M , ubiquitin ligase capable of ubiquitinating the second; the first is a protease capable of 

° cleaving the second; and so forth. The method can further include detecting the 

D 

U1 modification at each address of the plurality. 

*5 These embodiments can be used to identify an interaction or to identify a 

W compound that modulates, e.g., inhibits or enhances, an interaction. 

The binding agent can be attached to the substrate. For example, the substrate can 
be derivatized and the binding agent covalent attached thereto. The binding agent can be 
fU attached via a bridging moiety, e.g., a specific binding pair, (e.g., the substrate contains a 
O first member of a specific binding pair, and the binding agent is linked to the second 
member of the binding pair, the second member being attached to the substrate). 
In yet another embodiment, an insoluble substrate (e.g., a bead or particle), is disposed at 
each address of the plurality, and the binding agent is attached to the insoluble substrate. 
The insoluble substrate can further contain information encoding its identity, e.g., a 
reference to the address on which it is disposed. The insoluble substrate can be tagged 
using a chemical tag, or an electronic tag (e.g., a transponder). The insoluble substrate 
can be disposed such that it can be removed for later analysis. 

In another aspect, the invention features a method of evaluating, e.g., identifying a 
polypeptide-polypeptide interaction. The method includes: (1) providing or obtaining a 
substrate with a plurality of addresses, each address of the plurality comprising (i) a first 
nucleic acid encoding an amino acid sequence comprising a first amino acid sequence 
and an affinity tag, (ii) a binding agent that recognizes the affinity tag, and (iii) a second 
nucleic acid encoding a second amino acid sequence; (2) contacting each address of the 
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plurality with a translation effector to thereby translate the first nucleic acid and the 
second nucleic acid to synthesize the first and second amino acid sequences; and 
optionally (3) maintaining the substrate under conditions permissive for the hybrid amino 
acid sequence to bind binding agent. 

In one preferred embodiment, the first amino acid sequence is common to all 
addresses of the plurality, and a second test amino acid sequence is unique among all the 
addresses of the plurality. For example, the second test amino acid sequences can be 
query sequences whereas the first amino test amino acid sequence can be a target 
sequence. In another preferred embodiment, the first amino acid sequence is unique 
among all the addresses of the plurality, and the second amino acid sequence is common 
B to all addresses of the plurality. For example, the first test amino acid sequences can be 
[f\ query sequences whereas the second amino test amino acid sequence can be a target 
£ sequence. The second nucleic acid encoding the second test amino acid sequence can 
U include a sequence encoding a recognition tag and/or an affinity tag. 

The method can further include detecting the presence of the second amino acid 
sequence at each of the plurality of addresses, 
hj In one preferred embodiment, the second nucleic acid sequence also encodes a 

K polypeptide tag. The polypeptide tag can be an epitope (e.g., recognized by a monoclonal 
N antibody), or a binding agent (e.g., avidin or streptavidin, GST, or chitin binding protein). 
The detection of the second amino acid sequence can entail contacting each address of 
the plurality with a binding agent, e.g., a labeled biotin moiety, labeled glutathione, 
labeled chitin, a labeled antibody, etc. In another embodiment, each address of the 
plurality is contacted with an antibody specific to the second amino acid sequence. 

In another preferred embodiment, the second nucleic acid sequence includes a 
recognition tag. The recognition tag can be an epitope tag, enzyme or fluorescent 
protein. Examples of enzymes include horseradish peroxidase, alkaline phosphatase, 
luciferase, or cephalosporinase. The method can further include contacting each address 
of the plurality with an appropriate cofactor and/or substrate for the enzyme. Examples 
of fluorescent proteins include green fluorescent protein (GFP), and variants thereof, e.g., 
enhanced GFP, blue fluorescent protein (BFP), cyan FP, etc. The detection of the 
second amino acid sequence can entail monitoring fluorescence, assessing enzyme 
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activity, measuring an added binding agent, e.g., a labeled biotin moiety, a labeled 
antibody, etc. 

In another preferred embodiment, one is capable of modifying the other (e.g., 
making or breaking a bond, preferably a covalent bond, of the other). For example, the 
first amino acid sequence is kinase capable of phosphorylating the second amino acid 
sequence; the first is a methylase capable of methylating the second; the first is a 
ubiquitin ligase capable of ubiquitinating the second; the first is a protease capable of 
cleaving the second; and so forth. The method can further include detecting the 
modification at each address of the plurality. 

These embodiments can be used to identify an interaction or to identify a 
D compound that modulates, e.g., inhibits or enhances, an interaction. For example, the 
jjpj method can further include contacting each address of the plurality with a compound, 
y e.g., a small organic molecule, a polypeptide, or a nucleic acid to thereby determine if the 

U compound alters the interaction between the first and second amino acid. 

f ! 1 

In one preferred embodiment, the first amino acid sequence is a drug candidate, 
p e.g. a random peptide, a randomized or mutated scaffold protein, or a secreted protein 

TLi (e.g., a cell surface protein, an ectodomain of a transmembrane protein, an antibody, or a 

f M 

p polypeptide hormone); and the second amino acid sequence is a drug target. A first 
^ amino acid sequence at an address where an interaction between the first amino acid 
sequence and the second amino acid is detected can be used as a candidate amino acid 
sequence for additional refinement or as a drug. The first amino acid sequence can be 
administered to a subject. A nucleic acid encoding the first amino acid sequence can be 
administered to a subject. In a related preferred embodiment, the first amino acid 
sequence is the drug target, and the second amino acid sequence is the drug candidate. 

In a preferred embodiment, each first amino acid sequence in the plurality of 
addresses is unique. For example, a first amino acid sequence can differ from all other 
test amino acid sequence of the plurality by 1, or more amino acid differences, (e.g., 
about 2, 3, 4, 5, 8, 16, 32, 64 or more differences; and, by way of example, has about 800, 
256, 128, 64, or 32, 16, 8, 4, or fewer differences). In another preferred embodiment, the 
first amino acid sequence encoded by the nucleic acid at each address of the plurality is 
identical to all other first amino acid sequences in the plurality of addresses. In a 



34 



Attorne MletNo: 00246-260001/H1803 



preferred embodiment, the affinity tag encoded by the first nucleic acid at each address of 

the plurality is the same, or substantially identical to all other affinity tags in the plurality 

of addresses. In another preferred embodiment, the first nucleic acid at each address of 

the plurality encodes more than one affinity tag. In yet another preferred embodiment, 

the affinity tag encoded by the first nucleic acid at an address of the plurality differs from 

at least one other affinity tag in the plurality of addresses. 

In a preferred embodiment, the affinity tag is fused directly to the test amino acid 

sequence, e.g., directly amino-terminal, or directly carboxy-terminal. In another 

preferred embodiment, the affinity tag is separated from the test amino acid by one or 

more linker amino acids, e.g., 1, 2, 3, 4, 5, 6, 8, 10, 12, 20, 30 or more amino acids, 
p" . 
O preferably about 1 to 20, or about 3 to 12 amino acids. The linker amino acids can 

lf\ include a cleavage site, flexible amino acids (e.g., glycine, alanine, or serine, preferably 
« glycine), and/or polar amino acids. The linker and affinity tag can be amino-terminal or 
W carboxy-terminal to the test amino acid sequence. 

The first and/or second nucleic acid can be a RNA, or a DNA (e.g., a single- 
£5 stranded DNA, or a double stranded DNA). In a preferred embodiment, the first and/or 
m second nucleic acid includes a plasmid DNA or a fragment thereof; an amplification 
□ product (e.g., a product generated by RCA, PCR, NASBA); or a synthetic DNA. 
I'M The first and/or second nucleic acid can further include one or more of: a 

transcription promoter; a transcription regulatory sequence; a untranslated leader 
sequence; a sequence encoding a cleavage site; a recombination site; a 3' untranslated 
sequence; a transcriptional terminator; and an internal ribosome entry site. In one 
embodiment, the nucleic acid sequence includes a plurality of cistrons (also termed "open 
reading frames"), e.g., the sequence is dicistronic or polycistronic. In another 
embodiment, the nucleic acid also includes a sequence encoding a reporter protein, e.g., a 
protein whose abundance can be quantitated and can provide an indication of the quantity 
of test polypeptide fixed to the plate. The reporter protein can be attached to the test 
polypeptide, e.g., covalently attached, e.g., attached as a translational fusion. The 
reporter protein can be an enzyme, e.g., p-galactosidase, chloramphenicol acetyl 
transferase, p-glucuronidase, and so forth. The reporter protein can produce or modulate 
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light, e.g., a fluorescent protein (e.g., green fluorescent protein, variants thereof, red 
fluorescent protein, variants thereof, and the like), and luciferase. 

The transcription promoter can be a prokaryotic promoter, a eukaryotic promoter, 
or a viral promoter. In a preferred embodiment, the promoter is the T7 RNA polymerase 
promoter. The regulatory components, e.g., the transcription promoter, can vary among 
nucleic acids at different addresses of the plurality. For example, different promoters can 
be used to vary the amount of polypeptide produced at different addresses. 

In one embodiment, the first and/or second nucleic acid also includes at least one 
site for recombination, e.g., homologous recombination or site-specific recombination, 
e.g. a lambda att site or variant thereof; a lox site; or a FLP site. In a preferred 
O embodiment, the recombination site lacks stop codons in the reading frame of a nucleic 
Jf| acid encoding a test amino acid sequence. In another preferred embodiment, the 
£ recombination site includes a stop codon in the reading frame of a nucleic acid encoding 
U a test amino acid sequence. 

fL! In another embodiment, the first and/or second nucleic acid includes a sequence 

St 

£3 encoding a cleavage site, e.g., a protease site, e.g., a site cleaved by a site-specific 

JSC: 

TU protease (e.g., a thrombin site, an enterokinase site, a PreScission site, a factor Xa site, or 
j-| a TEV site), or a chemical cleavage site (e.g., a methionine, preferably a unique 
rLl methionine (cleavage by cyanogen bromide) or a proline (cleavage by formic acid)). 

The first nucleic acid can include a sequence encoding a second polypeptide tag 
in addition to the affinity tag. The second tag can be C-terminal to the test amino acid 
sequence and the affinity tag can be N-terminal to the test amino acid sequence; the 
second tag can be N-terminal to the test amino acid sequence, and the affinity tag can be 
C-terminal to the test amino acid sequence; the second tag and the affinity tag can be 
adjacent to one another, or separated by a linker sequence, both being N-terminal or C- 
terminal to the test amino acid sequence. In one embodiment, the second tag is an 
additional affinity tag, e.g., the same or different from the first tag. In another 
embodiment, the second tag is a recognition tag. For example, the recognition tag can 
report the presence and/or amount of test polypeptide at an address. Preferably the 
recognition tag has a sequence other than the sequence of the affinity tag. In still another 
embodiment, a plurality of polypeptide tags (e.g., less than 3, 4, 5, about 10, or about 20 
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tags) are encoded in addition to the first affinity tag. Each polypeptide tag of the plurality 
can be the same as or different from the first affinity tag. 

The first and/or second nucleic acid sequence can further include an identifier 
sequence, e.g., a non-coding nucleic acid sequence, e.g., one that is synthetically inserted 
, and allows for uniquely identifying the nucleic acid sequence. The identifier sequence 
can be sufficient in length to uniquely identify each sequence in the plurality; e.g., it is 
about 5 to 500, 10 to 100, 10 to 50, or about 10 to 30 nucleotides in length. The identifier 
can be selected so that it is not complementary or identical to another identifier or any 
region of each nucleic acid sequence of the plurality on the array. 

The first and/or second amino acid sequence can further include a protein splicing 
0 sequence or intein. The intein can be inserted in the middle of a test amino acid 
in sequence. The intein can be a naturally-occurring intein or a mutated intein. 

The first and/or second nucleic acid sequences encoding the first and/or second 
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W amino acid sequences can be obtained from a collection of full-length expressed genes 
[ lJ (e.g., a repository of clones), a cDNA library, or a genomic library. The first and/or 
fl second nucleic acid sequences can be nucleic acids expressed in a tissue, e.g., a normal or 
fU diseased tissue. The first and/or second amino acid sequences can be mutants or variants 
of a scaffold protein (e.g., an antibody, zinc-finger, polypeptide hormone etc.). In yet 
another embodiment, they are random amino acid sequences, patterned amino acids 
sequences, or designed amino acids sequences (e.g., sequence designed by manual, 
rational, or computer-aided approaches). 

The binding agent can be attached to the substrate. For example, the substrate can 
be derivatized and the binding agent covalent attached thereto. The binding agent can be 
attached via a bridging moiety, e.g., a specific binding pair, (e.g., the substrate contains a 
first member of a specific binding pair, and the binding agent is linked to the second 
member of the binding pair, the second member being attached to the substrate). 

In yet another embodiment, an insoluble substrate (e.g., a bead or particle), is 
disposed at each address of the plurality, and the binding agent is attached to the 
insoluble substrate. The insoluble substrate can further contain information encoding its 
identity, e.g., a reference to the address on which it is disposed. The insoluble substrate 
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can be tagged using a chemical tag, or an electronic tag (e.g., a transponder). The 
insoluble substrate can be disposed such that it can be removed for later analysis. 

In another aspect, the invention features a method of evaluating, e.g., identifying a 
polypeptide-polypeptide interaction. The method includes: (1) providing or obtaining an 
array made by the following process: (A) providing or obtaining a substrate with a 
plurality of addresses, each address having a binding agent that recognizes an affinity tag; 
(B) disposing in or on each address of the plurality (i) a first nucleic acid encoding an 
amino acid sequence comprising a first amino acid sequence and the affinity tag, and (ii) 
a second nucleic acid encoding a second amino acid sequence; and, optionally, (C) 

jj, contacting each address of the plurality with a translation effector to thereby translate the 

^ first and second nucleic acid. 

IH The method can further include maintaining the substrate under conditions 

Jp permissive for the hybrid amino acid sequence to bind binding agent. The method can 
further include detecting the presence of the second amino acid sequence at each of the 

i y 

s plurality of addresses. 

j*| In one preferred embodiment, the first amino acid sequence is common to all 

N addresses of the plurality, and a second test amino acid sequence is unique among all the 

pi 

Q addresses of the plurality. For example, the second test amino acid sequences can be 
^ query sequences whereas the first amino test amino acid sequence can be a target 

sequence. In another preferred embodiment, the first amino acid sequence is unique 
among all the addresses of the plurality, and the second amino acid sequence is common 
to all addresses of the plurality. For example, the first test amino acid sequences can be 
query sequences whereas the second amino test amino acid sequence can be a target 
sequence. The second nucleic acid encoding the second test amino acid sequence can 
include a sequence encoding a recognition tag and/or an affinity tag. 

The method can further include detecting the presence of the second amino acid 
sequence at each of the plurality of addresses. 

In one preferred embodiment, the second nucleic acid sequence also encodes a 
polypeptide tag. The polypeptide tag can be an epitope (e.g., recognized by a monoclonal 
antibody), or a binding agent (e.g., avidin or streptavidin, GST, or chitin binding protein). 
The detection of the second amino acid sequence can entail contacting each address of 
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the plurality with a binding agent, e.g., a labeled biotin moiety, labeled glutathione, 
labeled chitin, a labeled antibody, etc. In another embodiment, each address of the 
plurality is contacted with an antibody specific to the second amino acid sequence. 

In another preferred embodiment, the second nucleic acid sequence includes a 
recognition tag. The recognition tag can be an epitope tag, enzyme or fluorescent 
protein. Examples of enzymes include horseradish peroxidase, alkaline phosphatase, 
luciferase, or cephalosporinase. The method can further include contacting each address 
of the plurality with an appropriate cofactor and/or substrate for the enzyme. Examples 
of fluorescent proteins include green fluorescent protein (GFP), and variants thereof, e.g., 
enhanced GFP, blue fluorescent protein (BFP), cyan FP, etc. The detection of the 
second amino acid sequence can entail monitoring fluorescence, assessing enzyme 
activity, measuring an added binding agent, e.g., a labeled biotin moiety, a labeled 
antibody, etc. 

In another preferred embodiment, one is capable of modifying the other (e.g., 
making or breaking a bond, preferably a covalent bond, of the other). For example, the 
first amino acid sequence is kinase capable of phosphorylating the second amino acid 
sequence; the first is a methylase capable of methylating the second; the first is a 
ubiquitin ligase capable of ubiquitinating the second; the first is a protease capable of 
cleaving the second; and so forth. The method can further include detecting the 
modification at each address of the plurality. 

These embodiments can be used to identify an interaction or to identify a 
compound that modulates, e.g., inhibits or enhances, an interaction. For example, the 
method can further include contacting each address of the plurality with a compound, 
e.g., a small organic molecule, a polypeptide, or a nucleic acid to thereby determine if the 
compound alters the interaction between the first and second amino acid. 

In one preferred embodiment, the first amino acid sequence is a drug candidate, 
e.g. a random peptide, a randomized or mutated scaffold protein, or a secreted protein 
(e.g., a cell surface protein, an ectodomain of a transmembrane protein, an antibody, or a 
polypeptide hormone); and the second amino acid sequence is a drug target. A first 
amino acid sequence at an address where an interaction between the first amino acid 
sequence and the second amino acid is detected can be used as a candidate amino acid 
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sequence for additional refinement or as a drug. The first amino acid sequence can be 
administered to a subject. A nucleic acid encoding the first amino acid sequence can be 
administered to a subject. In a related preferred embodiment, the first amino acid 
sequence is the drug target, and the second amino acid sequence is the drug candidate. 

In a preferred embodiment, each first amino acid sequence in the plurality of 
addresses is unique. For example, a first amino acid sequence can differ from all other 
test amino acid sequence of the plurality by 1, or more amino acid differences, (e.g., 
about 2, 3, 4, 5, 8, 16, 32, 64 or more differences; and, by way of example, has about 800, 
256, 128, 64, or 32, 16, 8, 4, or fewer differences). In another preferred embodiment, the 
first amino acid sequence encoded by the nucleic acid at each address of the plurality is 
O identical to all other first amino acid sequences in the plurality of addresses. In a 
[n preferred embodiment, the affinity tag encoded by the first nucleic acid at each address of 
jjO the plurality is the same, or substantially identical to all other affinity tags in the plurality 
W of addresses. In another preferred embodiment, the first nucleic acid at each address of 
[ l! the plurality encodes more than one affinity tag. In yet another preferred embodiment, 
0 the affinity tag encoded by the first nucleic acid at an address of the plurality differs from 
fU at least one other affinity tag in the plurality of addresses. 

P In a preferred embodiment, the affinity tag is fused directly to the test amino acid 

sequence, e.g., directly amino-terminal, or directly carboxy-terminal. In another 
preferred embodiment, the affinity tag is separated from the test amino acid by one or 
more linker amino acids, e.g., 1, 2, 3, 4, 5, 6, 8, 10, 12, 20, 30 or more amino acids, 
preferably about 1 to 20, or about 3 to 12 amino acids. The linker amino acids can 
include a cleavage site, flexible amino acids (e.g., glycine, alanine, or serine, preferably 
glycine), and/or polar amino acids. The linker and affinity tag can be amino-terminal or 
carboxy-terminal to the test amino acid sequence. 

The first and/or second nucleic acid can be a RNA, or a DNA (e.g., a single- 
stranded DNA, or a double stranded DNA). In a preferred embodiment, the first and/or 
second nucleic acid includes a plasmid DNA or a fragment thereof; an amplification 
product (e.g., a product generated by RCA, PCR, NASBA); or a synthetic DNA. 

The first and/or second nucleic acid can further include one or more of: a 
transcription promoter; a transcription regulatory sequence; a untranslated leader 
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sequence; a sequence encoding a cleavage site; a recombination site; a 3' untranslated 
sequence; a transcriptional terminator; and an internal ribosome entry site. In one 
embodiment, the nucleic acid sequence includes a plurality of cistrons (also termed "open 
reading frames"), e.g., the sequence is dicistronic or polycistronic. In another 
embodiment, the nucleic acid also includes a sequence encoding a reporter protein, e.g., a 
protein whose abundance can be quantitated and can provide an indication of the quantity 
of test polypeptide fixed to the plate. The reporter protein can be attached to the test 
polypeptide, e.g., covalently attached, e.g., attached as a translational fusion. The 
reporter protein can be an enzyme, e.g., (3-galactosidase, chloramphenicol acetyl 
\a transferase, p-glucuronidase, and so forth. The reporter protein can produce or modulate 
G light, e.g., a fluorescent protein (e.g., green fluorescent protein, variants thereof, red 

W fluorescent protein, variants thereof, and the like), and luciferase. 

if] 

The transcription promoter can be a prokaryotic promoter, a eukaryotic promoter, 
»! or a viral promoter. In a preferred embodiment, the promoter is the T7 RNA polymerase 
■ promoter. The regulatory components, e.g., the transcription promoter, can vary among 
1 1 nucleic acids at different addresses of the plurality. For example, different promoters can 
J}j be used to vary the amount of polypeptide produced at different addresses. 
O in one embodiment, the first and/or second nucleic acid also includes at least one 

* site for recombination, e.g., homologous recombination or site-specific recombination, 
e.g., a lambda att site or variant thereof; a lox site; or a FLP site. In a preferred 
embodiment, the recombination site lacks stop codons in the reading frame of a nucleic 
acid encoding a test amino acid sequence. In another preferred embodiment, the 
recombination site includes a stop codon in the reading frame of a nucleic acid encoding 
a test amino acid sequence. 

In another embodiment, the first and/or second nucleic acid includes a sequence 
encoding a cleavage site, e.g., a protease site, e.g., a site cleaved by a site-specific 
protease (e.g., a thrombin site, an enterokinase site, a PreScission site, a factor Xa site, or 
a TEV site), or a chemical cleavage site (e.g., a methionine, preferably a unique 
methionine (cleavage by cyanogen bromide) or a proline (cleavage by formic acid)). 

The first nucleic acid can include a sequence encoding a second polypeptide tag 
in addition to the affinity tag. The second tag can be C-terminal to the test amino acid 
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sequence and the affinity tag can be N-terminal to the test amino acid sequence; the 
second tag can be N-terminal to the test amino acid sequence, and the affinity tag can be 
C-terminal to the test amino acid sequence; the second tag and the affinity tag can be 
adjacent to one another, or separated by a linker sequence, both being N-terminal or C- 
terminal to the test amino acid sequence. In one embodiment, the second tag is an 
additional affinity tag, e.g., the same or different from the first tag. In another 
embodiment, the second tag is a recognition tag. For example, the recognition tag can 
report the presence and/or amount of test polypeptide at an address. Preferably the 
recognition tag has a sequence other than the sequence of the affinity tag. In still another 
embodiment, a plurality of polypeptide tags (e.g., less than 3, 4, 5, about 10, or about 20 
£»' tags) are encoded in addition to the first affinity tag. Each polypeptide tag of the plurality 
[p] can be the same as or different from the first affinity tag. 

The first and/or second nucleic acid sequence can further include an identifier 
W sequence, e.g., a non-coding nucleic acid sequence, e.g., one that is synthetically inserted 
, and allows for uniquely identifying the nucleic acid sequence. The identifier sequence 
P can be sufficient in length to uniquely identify each sequence in the plurality; e.g., it is 

FU about 5 to 500, 10 to 100 5 10 to 50, or about 10 to 30 nucleotides in length. The identifier 

PI | 

p can be selected so that it is not complementary or identical to another identifier or any 

^ region of each nucleic acid sequence of the plurality on the array. 

The first and/or second amino acid sequence can further include a protein splicing 
sequence or intein. The intein can be inserted in the middle of a test amino acid 
sequence. The intein can be a naturally-occurring intein or a mutated intein. 

The first and/or second nucleic acid sequences encoding the first and/or second 
amino acid sequences can be obtained from a collection of full-length expressed genes 
(e.g., a repository of clones), a cDNA library, or a genomic library. The first and/or 
second nucleic acid sequences can be nucleic acids expressed in a tissue, e.g., a normal or 
diseased tissue. The first and/or second amino acid sequences can be mutants or variants 
of a scaffold protein (e.g., an antibody, zinc-finger, polypeptide hormone etc.). In yet 
another embodiment, they are random amino acid sequences, patterned amino acids 
sequences, or designed amino acids sequences (e.g., sequence designed by manual, 
rational, or computer-aided approaches). 
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The binding agent can be attached to the substrate. For example, the substrate can 
be derivatized and the binding agent covalent attached thereto. The binding agent can be 
attached via a bridging moiety, e.g., a specific binding pair, (e.g., the substrate contains a 
first member of a specific binding pair, and the binding agent is linked to the second 
member of the binding pair, the second member being attached to the substrate). 

In yet another embodiment, an insoluble substrate (e.g., a bead or particle), is 
disposed at each address of the plurality, and the binding agent is attached to the 
insoluble substrate. The insoluble substrate can further contain information encoding its 
identity, e.g., a reference to the address on which it is disposed. The insoluble substrate 
M, can be tagged using a chemical tag, or an electronic tag (e.g., a transponder). The 
S insoluble substrate can be disposed such that it can be removed for later analysis. 
Lfj In another aspect, the method features a method of evaluating, e.g., identifying, a 

lp polypeptide-polypeptide interaction. The method includes: (1) providing or obtaining an 
W , array made by the following production method: (A) providing or obtaining a substrate 
fLl with a plurality of addresses, each address of the plurality comprising (i) a first nucleic 
acid encoding a hybrid amino acid sequence comprising a first amino acid sequence and 
an affinity tag, (ii) a binding agent that recognizes the affinity tag, and (iii) a second 
O nucleic acid encoding a second amino acid sequence; and (B) contacting each address of 
fy the plurality with a translation effector to thereby translate the first and second nucleic 
acid sequences. The evaluation method further includes: (2) at each of the plurality of 
addresses, detecting at least one parameter selected from the group consisting of: (i) the 
proximity of the second amino acid sequence to the first amino acid sequence; (ii) the 
proximity of the second amino acid sequence to the substrate or a compound bound 
thereto; (iii) the rotational freedom of the second amino acid sequence; and (iv) the 
refractive index of the substrate. The evaluation method can optionally include, e.g., 
prior to the detecting step, (3) maintaining the substrate under conditions permissive for 
the hybrid amino acid sequence to bind binding agent. 

The method can further include washing the substrate prior to the detection step. 
The stringency of the wash step can be adjusted in order to remove the translation 
effector, and non-specifically bound proteins. 
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In one preferred embodiment, the first amino acid sequence is common to all 
addresses of the plurality, and a second test amino acid sequence is unique among all the 
addresses of the plurality. For example, the second test amino acid sequences can be 
query sequences whereas the first amino test amino acid sequence can be a target 
sequence. In another preferred embodiment, the first amino acid sequence is unique 
among all the addresses of the plurality, and the second amino acid sequence is common 
to all addresses of the plurality. For example, the first test amino acid sequences can be 
query sequences whereas the second amino test amino acid sequence can be a target 
sequence. The second nucleic acid encoding the second test amino acid sequence can 
include a sequence encoding a recognition tag and/or an affinity tag. 

The method can further include detecting the presence of the second amino acid 



in 
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IH sequence at each of the plurality of addresses. 

In one preferred embodiment, the second nucleic acid sequence also encodes a 
polypeptide tag. The polypeptide tag can be an epitope (e.g., recognized by a monoclonal 
antibody), or a binding agent (e.g., avidin or streptavidin, GST, or chitin binding protein), 
jp, The detection of the second amino acid sequence can entail contacting each address of 

the plurality with a binding agent, e.g., a labeled biotin moiety, labeled glutathione, 
Q labeled chitin, a labeled antibody, etc. In another embodiment, each address of the 

plurality is contacted with an antibody specific to the second amino acid sequence. The 
antibody can be labeled, e.g., with a fluorophore. 

In another preferred embodiment, the second nucleic acid sequence includes a 
recognition tag. The recognition tag can be an epitope tag, enzyme or fluorescent 
protein. Examples of enzymes include horseradish peroxidase, alkaline phosphatase, 
luciferase, or cephalosporinase. The method can further include contacting each address 
of the plurality with an appropriate cofactor and/or substrate for the enzyme. Examples 
of fluorescent proteins include green fluorescent protein (GFP), and variants thereof, e.g., 
enhanced GFP, blue fluorescent protein (BFP), cyan FP, etc. 

The method can further include contacting each address of the plurality with a 
compound, e.g., a small organic molecule, a polypeptide, or a nucleic acid to thereby 
determine if the compound alters the interaction between the first and second amino acid. 
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In one preferred embodiment, the first amino acid sequence is a drug candidate, 
e.g. a random peptide, a randomized or mutated scaffold protein, or a secreted protein 
(e.g., a cell surface protein, an ectodomain of a transmembrane protein, an antibody, or a 
polypeptide hormone); and the second amino acid sequence is a drug target. A first 
amino acid sequence at an address where an interaction between the first amino acid 
sequence and the second amino acid is detected can be used as a candidate amino acid 
sequence for additional refinement or as a drug. The first amino acid sequence can be 
administered to a subject. A nucleic acid encoding the first amino acid sequence can be 
administered to a subject. In a related preferred embodiment, the first amino acid 
sequence is the drug target, and the second amino acid sequence is the drug candidate. 

In a preferred embodiment, each first amino acid sequence in the plurality of 
addresses is unique. For example, a first amino acid sequence can differ from all other 
test amino acid sequence of the plurality by 1, or more amino acid differences, (e.g., 
about 2, 3, 4, 5, 8, 16, 32, 64 or more differences; and, by way of example, has about 800, 
256, 128, 64, or 32, 16, 8, 4, or fewer differences). In another preferred embodiment, the 
first amino acid sequence encoded by the nucleic acid at each address of the plurality is 
identical to all other first amino acid sequences in the plurality of addresses. In a 
preferred embodiment, the affinity tag encoded by the first nucleic acid at each address of 
the plurality is the same, or substantially identical to all other affinity tags in the plurality 
of addresses. In another preferred embodiment, the first nucleic acid at each address of 
the plurality encodes more than one affinity tag. In yet another preferred embodiment, 
the affinity tag encoded by the first nucleic acid at an address of the plurality differs from 
at least one other affinity tag in the plurality of addresses. 

In a preferred embodiment, the affinity tag is fused directly to the test amino acid 
sequence, e.g., directly amino-terminal, or directly carboxy-terminal. In another 
preferred embodiment, the affinity tag is separated from the test amino acid by one or 
more linker amino acids, e.g., 1, 2, 3, 4, 5, 6, 8, 10, 12, 20, 30 or more amino acids, 
preferably about 1 to 20, or about 3 to 12 amino acids. The linker amino acids can 
include a cleavage site, flexible amino acids (e.g., glycine, alanine, or serine, preferably 
glycine), and/or polar amino acids. The linker and affinity tag can be amino-terminal or 
carboxy-terminal to the test amino acid sequence. 
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The first and/or second nucleic acid can be a RNA, or a DNA (e.g., a single- 
stranded DNA, or a double stranded DNA). In a preferred embodiment, the first and/or 
second nucleic acid includes a plasmid DNA or a fragment thereof; an amplification 
product (e.g., a product generated by RCA, PCR, NASBA); or a synthetic DNA. 

The first and/or second nucleic acid can further include one or more of: a 
transcription promoter; a transcription regulatory sequence; a untranslated leader 
sequence; a sequence encoding a cleavage site; a recombination site; a 3' untranslated 
sequence; a transcriptional terminator; and an internal ribosome entry site. In one 
embodiment, the nucleic acid sequence includes a plurality of cistrons (also termed "open 
y, reading frames"), e.g., the sequence is dicistronic or polycistronic. In another 
p embodiment, the nucleic acid also includes a sequence encoding a reporter protein, e.g., a 
in protein whose abundance can be quantitated and can provide an indication of the quantity 
j» of test polypeptide fixed to the plate. The reporter protein can be attached to the test 
W polypeptide, e.g., covalently attached, e.g., attached as a translational fusion. The 

reporter protein can be an enzyme, e.g., p-galactosidase, chloramphenicol acetyl 
tl transferase, p -glucuronidase, and so forth. The reporter protein can produce or modulate 
EH light, e.g., a fluorescent protein (e.g., green fluorescent protein, variants thereof, red 

i is' 

O fluorescent protein, variants thereof, and the like), and luciferase. 

The transcription promoter can be a prokaryotic promoter, a eukaryotic promoter, 
or a viral promoter. In a preferred embodiment, the promoter is the T7 RNA polymerase 
promoter. The regulatory components, e.g., the transcription promoter, can vary among 
nucleic acids at different addresses of the plurality. For example, different promoters can 
be used to vary the amount of polypeptide produced at different addresses. 

In one embodiment, the first and/or second nucleic acid also includes at least one 
site for recombination, e.g., homologous recombination or site-specific recombination, 
e.g., a lambda att site or variant thereof; a lox site; or a FLP site. In a preferred 
embodiment, the recombination site lacks stop codons in the reading frame of a nucleic 
acid encoding a test amino acid sequence. In another preferred embodiment, the 
recombination site includes a stop codon in the reading frame of a nucleic acid encoding 
a test amino acid sequence. 
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In another embodiment, the first and/or second nucleic acid includes a sequence 
encoding a cleavage site, e.g., a protease site, e.g., a site cleaved by a site-specific 
protease (e.g., a thrombin site, an enterokinase site, a PreScission site, a factor Xa site, or 
a TEV site), or a chemical cleavage site (e.g., a methionine, preferably a unique 
methionine (cleavage by cyanogen bromide) or a proline (cleavage by formic acid)). 

The first nucleic acid can include a sequence encoding a second polypeptide tag 
in addition to the affinity tag. The second tag can be C-terminal to the test amino acid 
sequence and the affinity tag can be N-terminal to the test amino acid sequence; the 
second tag can be N-terminal to the test amino acid sequence, and the affinity tag can be 
C-terminal to the test amino acid sequence; the second tag and the affinity tag can be 
adjacent to one another, or separated by a linker sequence, both being N-terminal or C- 
terminal to the test amino acid sequence. In one embodiment, the second tag is an 
additional affinity tag, e.g., the same or different from the first tag. In another 
embodiment, the second tag is a recognition tag. For example, the recognition tag can 
report the presence and/or amount of test polypeptide at an address. Preferably the 
recognition tag has a sequence other than the sequence of the affinity tag. In still another 
embodiment, a plurality of polypeptide tags (e.g., less than 3, 4, 5, about 10, or about 20 
tags) are encoded in addition to the first affinity tag. Each polypeptide tag of the plurality 
can be the same as or different from the first affinity tag. 

The first and/or second nucleic acid sequence can further include an identifier 
sequence, e.g., a non-coding nucleic acid sequence, e.g., one that is synthetically inserted 
, and allows for uniquely identifying the nucleic acid sequence. The identifier sequence 
can be sufficient in length to uniquely identify each sequence in the plurality; e.g., it is 
about 5 to 500, 10 to 100, 10 to 50, or about 10 to 30 nucleotides in length. The identifier 
can be selected so that it is not complementary or identical to another identifier or any 
region of each nucleic acid sequence of the plurality on the array. 

The first and/or second amino acid sequence can further include a protein splicing 
sequence or intein. The intein can be inserted in the middle of a test amino acid 
sequence. The intein can be a naturally-occurring intein or a mutated intein. 

The first and/or second nucleic acid sequences encoding the first and/or second 
amino acid sequences can be obtained from a collection of full-length expressed genes 
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(e.g., a repository of clones), a cDNA library, or a genomic library. The first and/or 
second nucleic acid sequences can be nucleic acids expressed in a tissue, e.g., a normal or 
diseased tissue. The first and/or second amino acid sequences can be mutants or variants 
of a scaffold protein (e.g., an antibody, zinc-finger, polypeptide hormone etc.). In yet 
another embodiment, they are random amino acid sequences, patterned amino acids 
sequences, or designed amino acids sequences (e.g., sequence designed by manual, 
rational, or computer-aided approaches). 

The binding agent can be attached to the substrate. For example, the substrate can 
be derivatized and the binding agent covalent attached thereto. The binding agent can be 
attached via a bridging moiety, e.g., a specific binding pair, (e.g., the substrate contains a 
first member of a specific binding pair, and the binding agent is linked to the second 
member of the binding pair, the second member being attached to the substrate). 
In yet another embodiment, an insoluble substrate (e.g., a bead or particle), is disposed at 
each address of the plurality, and the binding agent is attached to the insoluble substrate. 
The insoluble substrate can further contain information encoding its identity, e.g., a 
reference to the address on which it is disposed. The insoluble substrate can be tagged 
using a chemical tag, or an electronic tag (e.g., a transponder). The insoluble substrate 
can be disposed such that it can be removed for later analysis. 

In another aspect the invention features a method of identifying an enzyme 
substrate or cofactor. The method includes: (1) providing a substrate with a plurality of 
addresses, each address of the plurality comprising (i) a first nucleic acid encoding a 
hybrid amino acid sequence comprising a first amino acid sequence and an affinity tag, 

(ii) a binding agent that recognizes the affinity tag and is attached to the substrate, and 

(iii) a second nucleic acid encoding an enzyme; (2) contacting each address of the 
plurality with a translation effector to thereby translate the first and second nucleic acid 
sequences; (3) maintaining the substrate under conditions permissive for the hybrid 
amino acid sequence to bind binding agent and for activity of the enzyme; (4) detecting 
the activity of the enzyme at each address of the plurality. 

In one embodiment, the first amino acid sequence varies among the addresses of 
the plurality. In another embodiment, the second nucleic acid varies among the addresses 
of the plurality. 
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The method can further include contacting each address of the plurality with an 
enzyme substrate (e.g., radioactive or otherwise labeled such as with ATP, GTP, s- 
adenosylmethionine, ubiquitin, and so forth) or a cofactor, e.g., NADH, NADPH, FAD. 
A substrate or cofactor can be provided with the translation effector. 

The detecting step can include monitoring a protein bound by the labeled binding 
agent (radioactive or otherwise), e.g., after a wash step. The label can be present in 
solution (e.g., as a cofactor or reaction substrate) and can be transferred to first amino 
acid sequence by the enzyme, e.g., such that the label is covalently attached to the first 
amino acid sequence (e.g., such as in phosphorylation). The label can be present in 
solution and can be bound to the first amino acid sequence (e.g., non-covalently) as a 
b result of an enzyme catalyzed or assisted reaction (e.g., the enzyme can effect a 
IFj conformational change in the first amino acid sequence, such as a GTP exchange factor 
Jp protein acting on a GTP binding protein). 

W In one preferred embodiment, the first amino acid sequence is common to all 

addresses of the plurality, and a second test amino acid sequence is unique among all the 
addresses of the plurality. For example, the second test amino acid sequences can be 
W query sequences whereas the first amino test amino acid sequence can be a target 
O sequence. In another preferred embodiment, the first amino acid sequence is unique 
fU among all the addresses of the plurality, and the second amino acid sequence is common 
to all addresses of the plurality. For example, the first test amino acid sequences can be 
query sequences whereas the second amino test amino acid sequence can be a target 
sequence. The second nucleic acid encoding the second test amino acid sequence can 
include a sequence encoding a recognition tag and/or an affinity tag. 

In a preferred embodiment, each first amino acid sequence in the plurality of 
addresses is unique. For example, a first amino acid sequence can differ from all other 
test amino acid sequence of the plurality by 1, or more amino acid differences, (e.g., 
about 2, 3, 4, 5, 8, 16, 32, 64 or more differences; and, by way of example, has about 800, 
256, 128, 64, or 32, 16, 8, 4, or fewer differences). In another preferred embodiment, the 
first amino acid sequence encoded by the nucleic acid at each address of the plurality is 
identical to all other first amino acid sequences in the plurality of addresses. In a 
preferred embodiment, the affinity tag encoded by the first nucleic acid at each address of 
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the ptaralny is the same, or substantially identical to all other affinity tags in the plural,* 
of addresses. In another preferred embodiment, the firs, nne.eic aeid at eaeh address of 
,he plurality eneodes more than one affinity tag. In ye. another preferred embodtmen, 
the affintty tag encoded by fire firs, nucleic aerd a, an address of the plurality differ, from 
at least one other affinity tag in the plurality of addresses. 

to a preferred embodiment, the affinity tag is fitsed directly to the test ammo aetd 
sequence, e.g., directly amino-tetminal, or directly caAoxy-terminal. In another 
preferred embodiment, the affinity tag is separated from the tea, amino acid by one or 
more linker amino acids, e.g., 1. 2, 3, 4, 5, 6, 8, 10, 12, 20, 30 or more amino actds, 
preferably about 1 to 20, or about 3 to 12 amino acids. The linker amino actds can 
include a c.eavage site, flexible amino acids (e.g., glycine, alanine, or serine, preferably 
glycine), and/or polar amino acids. The linker and affinity tag can be amino-termtnal or 
carboxy-terminal to the test amino acid sequence. 

The first and/or second nucleic acid can be a RNA, or a DNA (eg., a stngle- 
stranded DNA, or a double stranded DNA). In a preferred embodiment, the first and/or 
second nucleic acid includes a plasmid DNA or a fragment thereof; an amplification 
product (e.g„ a product generated by RCA, PCR, NASBA); or a synthetic DNA. 

The first and/or second nucleic acid can further include one or more oft a 
transcriptionpmmoteriatranscriptionregulatorysequenceiauntranslaledleader 

sequence; a sequence encoding a c.eavage site; a recombination site; a V untranslated 
sequence; a transcriptional terminator; and an internal ribosome entry site. In one 
embodiment, the nucleic acid sequence tncludes a plurality of cistrons (also termed open 
reading frames"), e.g., the sequence is dicistrouic or polycistronic. In another 
embodiment, fire nucleic acid also includes a sequence encoding a reporter protetn, e.g., a 
protein whose abundance can be quantised and can provide an indication of .he quantity 
of.es. polypeptide fixed to me plate. The reporter protein can be attached ,o fire tea. 
Polypeptide, e.g., covalent.y attached, e.g., attached as a translationa. fusion. The 
reporter protein can be an enzyme, e.g., p-galactosidase, chloramphenicol acetyl 
transferase, p-glucuromdase, and so forth. The reporter protein can produce or modulate 
Ugh. e.g., a fluorescent protein (e.g., green fluorescent protein, variants thereof, red 
ftaorescen. protein, variants thereof, and tire like), and luciferase. 
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The transcription promoter can be a prolcaryotic promoter, a eukaryotic promoter 

lorer — -rrTZZl 

beused.ovaryfteamoun.ofpo^ep.ideprodueeda.differeo.addresses. 

h onLbodimen,, ft. firs, and/or second nucleic acid also includes a, leas, one 
site for recombination, e.g., hotnotogous recombination or site-speciftc recnmbtnauon, 
eg alambdaanstteorvariantnrereoftatoxsi^oraFLPsite. Inapreferred 

acidencodinga.estaminoacidsequence. In another preferred embodrmen, the 

protease (e.g., a thrombin site, an enterokinase site, a PreScrsston sue, a factor Xa stte, 
a TEV site), or a chemical cleavage site (e.g., a methionine, preferab.y a untque 

The first nudeic acid can ineh.de a sequence encoding a second po.ypep.tde «g 
^difiontotheaHmi.y.g.ThesecondtagcanbeC— » <he .est anuno »d 

seq uence and the affinity tag can be N-tenninal ,„ the test amino acrd sequenc , the 
Id tagcanbeN-ermina, t o the test amino acid sequence, and the affintty tag can be 

Tim , to the tea, amino acid sequence; the second tag and .he affinity .ag can be 
Inano.he.es.aminoac.dseqnence.moneembodimenMhesecond.agtsan 

addttiona. affinity tag, e.g., .he same or dtffcren, from the firs, tag. In ano er 
embodimen,, me second ,ag is a recognition ,ag. For etcamp.e, mc recogn,,, n , ag «. 
report ,he presence and/or amonn, of ,es, po.ypeptide a, an address. Preferab, e 

tags) are encoded in addition ,o me firs, affinity tag. Each poiypeptide tag of.be plnrahty 
can be the same as or different from the firs, affinity tag. 
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The first and/or second nueleie aeid sequence can further include an identifier 
sequence, e.g., a non-coding nudeic acid sequence, e.g., one tha, is synthetically tnserted 
and aUows for uniquely identifying the nucleic acid sequence. The identifier sequence 
'can be sufficient in .ength «„ uniquely identify each sequence in the plurahty; e.g., tt ts 
about 5 to 500, 10 to 100, 10 to 50, or about 10 to 30 nucleotides in length. The .dentifier 
can be seleoted so that it is no, complementary or identical to another identifier or any 
region of each nucleic acid sequence of the plurality on the array. 

The first and/or second amino aeid sequence can further include a protein sphemg 
sequence or intern. The in.ein can be inserted in the middle of a test ammo ac,d 
sequence. The intein can be a naturally-occurring intein or a mutated mtem. 

The first and/or second nucleio acid sequences encoding the firs, and/or second 
amino acid sequences can be obtained from a collection of full-.et.gm expressed genes 
(eg areposi,oryofclones),acDNAlibrary,oragenomichbrary. The first and/or 
second nudeic acid sequences can be nuelerc acids expressed in a tissue, e.g., a norma, or 
diseased tissue. The firs, and/or second ammo aeid sequences can be mu,an,s or vanants 
of a scaffold protein (e.g., an antibody, zine-finger, polypeptide hormone etc.). In ye, 
another embodiment, >hey are random amino acid sequences, patterned ammo ae,ds 
sequences, or designed amino acids sequences (e.g., sequence designed by manual, 
rational, or computer-aided approaches). 

The binding agent can bo atiached ,0 me substrate. For example, me substrate can 
be derivatized and ,he binding agen, covalen, attached thereto. The binding agen, can be 
attached via a bridging moiety, e.g., a specific binding pair, (e.g., the substrate comams a 
firs, member of a specific binding pair, and the binding agen. is linked to the second 
m ember of the binding pair, the second member being attached to the substrate). 
ta ye, anomer embodiment, an insoluble substrate (e.g„ a bead or particle), is disposed a, 
each address of tire plural,,,, and tire binding agen. is attached to the tnso.ub.e subs,ra,e. 
The insoluble substrate can further contain information encoding its tdenttty, e.g„ a 
reference ,0 ,he address on which i, is disposed. The insoluble subs,ra,e can be ragged 
using a ehemtca, tag, or an electronic tag (e.g., a transponder). The insoluble subs,ra,c 
ean be disposed such ,ha, i, can be removed for later analys,s. 



52 



e 

m 



Attorncy^Tet No: 00246-260001 /HI 803 



In another aspect, the invention features a method of producing a protein- 
interaction map for a plurality of amino acid sequences. The method includes: (1) 
providing (i) a first plurality of nucleic acid sequences, each encoding an ammo acid 
sequence comprising an amino acid sequence of the plurality of amino acid sequences 
and an affinity tag; (ii) a second plurality of nucleic acid, each encoding an ammo add 
sequence comprising an amino acid sequence of the plurality of amino acid sequences 
and recognition tag; and (hi) a substrate with a plurality of addresses and a binding agent 
that binds the affinity tag and is attached to the substrate; (2) disposing on the substrate, 
at each address of the plurality of addresses, a nucleic acid of the first plurality and a 
j, nucleic acid of the second plurality; (3) contacting each address of the plurality of 
O addresses with a translation effector to thereby translate the first and second nucleic acid 
If! sequences; (4) maintaining the substrate under conditions permissive for the affinity tag 
J to bind binding agent; (5) optionally washing the substrate to remove the translation 
£ effector and unbound polypeptides; and (6) detecting the recognition tag at each address 

ru 

s of the plurality. 

g In a preferred embodiment, all possible pairs of amino acid sequences from the 

N plurality of amino acid sequences are present on the array. 

R ' Also featured is a database, e.g., in computer memory or a computer readable 
S medium. Each record of the database can include a field for the amino acid sequence 
encoded by the first nucleic acid sequence, a field for the amino acid sequence encoded 
by the second nucleic acid sequence, and a field representing the result (e.g., a qualitative 
or quantitative result) of detecting the recognition tag in the aforementioned method. 
The database can include a record for each address of the plurality present on the array. 
Further the database can include a descriptor or reference for the physical location of the 
nucleic acid sequence on the array. The records can be clustered or have a reference to 
other records (e.g., including hierarchical groupings) based on the result. 

Also featured is a method of providing tagged polypeptides. The method 
includes: (1) providing a substrate with a plurality of addresses, each address of the 
plurality comprising (i) a nucleic acid encoding an amino acid sequence comprising a test 
amino acid sequence and an affinity tag, and (ii) a particle attached to a binding agent 
that recognizes the affinity tag; (2) contacting each address of the plurality with a 
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translation effector to thereby translate the amino acid sequence; and (3) maintaining the 
substrate under conditions permissive for the amino acid sequence to contact the binding 
agent. 

In one preferred embodiment, the nucleic acid sequence is also attached to the 
particle. 

In another preferred embodiment, the particle, e.g., a bead or nanoparticle, further 
contains information encoding its identity, e.g., a reference to the address on which it is 
disposed. The particle can be tagged using a chemical tag, or an electronic tag (e.g., a 
transponder). The particles can be disposed on the substrate such that they can be 
I* removed for later analysis. In one embodiment, multiple particles with the same 
pi identifier are disposed at each address of the plurality. The particles can be collected 
iS after translation and attachment of the amino acid sequence. The particles can then be 
? subdivided into aliquots. A particle with a given property, e.g., the ability to bind a 

tH labeled compound can be identified. The identity of the particle can be determined to 

ru 

" thereby identify the amino acid sequence attached to the particle. 

In a preferred embodiment, each test amino acid sequence in the plurality of 
j} 1 , addresses is unique. For example, a test amino acid sequence can differ from all other 
6 test amino acid sequence of the plurality by 1 , or more amino acid differences, (e.g., 

about 2, 3, 4, 5, 8, 16, 32, 64 or more differences; and, by way of example, has about 800, 
256, 128, 64, or 32, 16, 8, 4, or fewer differences). In another preferred embodiment, the 
test amino acid sequence encoded by the nucleic acid at each address of the plurality is 
identical to all other test amino acid sequences in the plurality of addresses. In a 
preferred embodiment, the affinity tag encoded by the nucleic acid at each address of the 
plurality is the same, or substantially identical to all other affinity tags in the plurality of 
addresses. In another preferred embodiment, the nucleic acid at each address of the 
plurality encodes more than one affinity tag. In yet another preferred embodiment, the 
affinity tag encoded by the nucleic acid at an address of the plurality differs from at least 
one other affinity tag in the plurality of addresses. 

In a preferred embodiment, the affinity tag is fused directly to the test amino acid 
sequence, e.g., directly amino-terminal, or directly carboxy-terminal. In another 
preferred embodiment, the affinity tag is separated from the test amino acid by one or 
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mor e linker amino acids, e.g., 1,2,3, 4, 5, 6, 8, .0, .2, 20, 30 or more amino actds, 
preferably about 1 to 20, or about 3 .0 12 amino acid, Tbe .inker amino aetds can 
incl „d. a cleavage site, flexible amino acids (e.g„ glycine, alanine, or serine, preferably 
g,ycine), and/or polar amino acid, The linker and affinity tag can be amino— or 
carboxy-terminal to the test amino acid sequence. 

The nucleic acid can be a RNA, or a DN A (e.g., a single-stranded DNA, or a 
double stranded DNA). In a preferred embodiment, the nucleic acid includes aplasmtd 
DNA or a fragment thereof; an amplification product (e.g., a product generated by RCA, 

PCR, NASBA); or a synthetic DNA. 

The nucleic acid ean further include one or more of: a ttanscription promoter; a 
transcription regulatory sequence; a entranced leader sequence; a sequence encodmg a 
eleavage site; a recombination site; a J- untranslated sequence; a tranacnpttonal 
terminator; and an interna, ribosome entry she. In one embodiment, the nucletc ac, 
sequence includes a plurality of damans (also termed "open reading frames ), e.g. the 
sequence is dicistronic or po.yeis.mnic. In another embodiment, the nuc.etc ac,d also 
inc ,udes a sequence encoding a reporter protein, e.g., a protein whose abundance can e 
stated and can provide an indication of the quantity of test po.ypeptide fixer, to the 
p,a,e The reporter protein can be attached to tire test polypeptide, e.g., covalently 
attached, e.g., attaehed as a translation, fusion. The reporter protein can be an enzyme, 

e . p-galactosidase, chloramphenicol acetyl transferase, 0-glucuronidase, and so forth. 

The' reporter protein can produce or modu.a,e tight, e.g., a fluorescent protein (e.g green 

fluorescent protein, variants thereof, red fluorescent protein, variants thereof, and the 

like), and luciferase. 

The transcription promoter can be a prokaryotic promoter, a eukaryofc promoter, 
or . vha. promoter, In a preferred embodiment, the promoter is the T7 RNA polymerase 
promoter. The regulatory components, e.g., the transcription promoter, can vary among 
Lleic acids at different addresses of the plurality. For example, different promoters can 
be used to vary the amount of polypeptide produced a. different addresses. 

In one embodiment, the nucleic acid also includes at least one stte for 
recombination, e.g., homologous recombination or site-specific recombination, e.g., a 
lamM a a„ she or variant thereof; a .ox she; or a FLP site. In a preferred embodmrent, 
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the combination site lacks stop colons in the reading frame of a nucleic acrd encodmg a 
,esr amino ae,d sequence, m another preferred embodiment, the reeombmatton s,.e 
tacl „des a stop eCon in the reading name of a nneieie aeid encoding a test ammo actd 

^""Tn another embodiment, the nneieie aeid inc.ndes a sequence encoding a cleavage 
she e g„ a protease site, e.g., a site e.eaved by a site-specific protease (e.g., a thrombm 

cl e;vagesHe(e.g.,ame,hionine,preferab, y annianemefhionme(e 1 eavageb y cyanogen 

bromide) or a proline (oleavage by formic acid)). 

The nneieie aeid can inelnde a seouenee encoding a second polypephde tag » 
additton to me affinity tag. The second tag can be C-termina, to the test amino ae.d 
sequence and the affinity tag can be N-termina, to the tea. amino aeid sequence; 
second tag can be N-termina, ,0 me test amino acid seance, and the affinity tag can be 
Ctermma, to the test amino acid seance; the second tag and the affinity tag can be 
adjacent to one another, or separated by a linker sconce, both being N-,erm,na. or C- 
.enninal ,„ the test amino acid sequence. In one embodiment, the second tag ,s an 
addifional affinity tag, e.g., .be same or different fiom the firs, tag. fit another 
embodiment, the second tag is a recognition tag. For example, the recogmtton »g can 
report the presence and/or amonn, of test polypeptide at an address. ^« 
Ignition taghasasequenceother than.be seonence of me affintty «g.n sttfi^er 

tags) are encoded in addition to the firs, affinity tag. Each polypeptide tag of the plnrah.y 
can be the same as or different from the first affinity tag. 

The nucleic acid sequence can further include an identifier sequence, e.g., a non- 
coding nucleic acid sequence, e.g., one that is synthetically inserted , and allows for 

in .ength to uniquely identify each sequence in the phtraUty; e.g., t, ,s about 5 to CO 0 
1 10 0 ,0 to 50, or about ,0 to 30 nucleotides in .engtb. The identifier can be selected so 
m i. 'is no, complementary or identica. ,0 another identifier or any region of each 
nucleic acid sequence of the plurality on the array. 
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The tea. amino acid seance can U. include a protein sp.icing sequence o, 

can be a naturally-occurring intein or a mutated intern. 

The nucleic acid sequences encoding the test amino acid sequences can be 
obtained from a cofiection of mll-length expressed genes (e.g., a repository of Cones), a 
cDNA Ubrary, or a genomic hbrary. The test amino acid sequences can be genes 
Zssed.na t ,s S ue,e,.,anorma 1 ordi S ea S ed,is S ue. Tb^tpolypepb escan e 
m l«s or vanants of a scaffold protein (e.g., an antibody, zinc-finger, pdypepude 

£ sequence designed by manual, rational, or computer-aided approaches). The p ura! ty of 
B l„ d source. For examp.e, the tes, amino acid sequences on ha,f me addresses of an 
half are from a normal tissue or a second species. 

' Thebindiogagentcanbeattachcdtothesnbsh., Forexam,e the — c, 

be denvatized and the binding agent covaient attached thereto. The btndtng agent an b 
be aenvau^o uinAino na ; r ( e e the substrate contains a 

attached via a bndging moiety, e.g., a spec.fic bmdmg patr. (e g * 
f, Kt member of a specific binding pair, and the binding agent ,s finked to . e second 
m e m be, of the binding pair, the second member being attached ,0 the substrate). 

,„ another aspect, the invention features a method of providing tagged 
peptide. The method includes: providing a substrate with a p.urafity of address, 
^address of the pinrafity having a nudeic acid (i) encoding an ammo actd sequ.ee 

eaehaddressofmepiurafitywima—neffeetorto ^ — 
aeid sequence; and maintaining the substrate under conditions perm.ss.ve for th tag o 
H.hchald.cto thereby f om a complex of the nuc,e,c ac,d and the test peptide 

having the test amino acid sequence . 

,„ one embodiment, the handie is biotin, and the tag ts avidin. For cxantp e, the 
nucl e,c acid has a biotin covaien, attached to a nucleotide. The nnc.eic acid c an e 
formed by amotion of a tempUte nucieic acid using a spheric ofigonucieotide 
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having a biotin moiety covalently attached at its 5' end. In another embodiment, the 
handle is glutathione, and the tag is glutathione-S-transferase. For example, the nucleic 
acid has a glutathione moiety covalent attached to a nucleotide. The nucleic acid can be 
formed by amplification of a template nucleic acid using a synthetic oligonucleotide 
having a biotin moiety covalently attached at its 5' end. 

In one embodiment, the handle includes a keto group, and the tag is a hydrazine. 
A covalent bond is formed between the handle and tag. 

The method can further includes combining the complexes formed at all the 
addresses into a pool, selecting a polypeptide from the pool, and amplifying the 
|* complexed nucleic acid sequence to thereby identify the selected amino acid sequence, 
p; In a preferred embodiment, each test amino acid sequence in the plurality of 

addresses is unique. For example, a test amino acid sequence can differ from all other 
test amino acid sequence of the plurality by 1, or more amino acid differences, (e.g., 
W about 2, 3, 4, 5, 8, 1 6, 32, 64 or more differences; and, by way of example, has about 800, 
I" 256, 128, 64, or 32, 16, 8, 4, or fewer differences). In another preferred embodiment, the 
fl test amino acid sequence encoded by the nucleic acid at each address of the plurality is 
EH identical to all other test amino acid sequences in the plurality of addresses. In a 
0 preferred embodiment, the affinity tag encoded by the nucleic acid at each address of the 
fli plurality is the same, or substantially identical to all other affinity tags in the plurality of 
addresses. In another preferred embodiment, the nucleic acid at each address of the 
plurality encodes more than one affinity tag. In yet another preferred embodiment, the 
affinity tag encoded by the nucleic acid at an address of the plurality differs from at least 
one other affinity tag in the plurality of addresses. 

In a preferred embodiment, the tag is fused directly to the test amino acid 
sequence, e.g., directly amino-terminal, or directly carboxy-terminal. In another 
preferred embodiment, the tag is separated from the test amino acid by one or more linker 
amino acids, e.g., 1, 2, 3, 4, 5, 6, 8, 10, 12, 20, 30 or more amino acids, preferably about 
1 to 20, or about 3 to 12 amino acids. The linker amino acids can include a cleavage site, 
flexible amino acids (e.g., glycine, alanine, or serine, preferably glycine), and/or polar 
amino acids. The linker and tag can be amino-terminal or carboxy-terminal to the test 
amino acid sequence. 
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u o„ pma or a DNA (e.g., a single-stranded DNA, or a 
The nucleic acid can be an RNA, or a urs a ^ s & 

double sanded DNA). .n a preferred embodiment, the nucleic acid inc.udes a p.asm,d 
DNA or a fragment thereof; an ampfrf.ca.ion product (e.g., a product generated by RCA, 
PCR.NASBA); or a synthetic DNA. 

The nucleic acid can further ine.ude one or more of: a transeriphon promoter; 
ascription regulatory seduce; a untranslated ieader sequence; a sequence encodmg a 
deavage site; a recombination site; a y un.rans.ared sequence; a transcnphona. 
terminator; and an interna, nbosome entry site, to one embodiment, the nucle.c ac 
sequence inc.udes a plurality of cistrons (also termed "open reading frames ), e.g., he 
sequence is dicistronic or poiycistronic. to anomer embodimen,, me nucle.c acd a.so 
includes a sequence encodmg a reporter protein, e.g., a protein whose 
qu an.i.ated and can provide an indication of me quantity of test po.ypept.de fixed .o 
p ,a.e The reporter pro.e,n can be attached ,o the m polypeptide, e.g., covalenfiy 
attached, e.g., afrached as a transanal msion. The porter protein can be » enzyme, 
e . p-ga.ac.os.dase, ch.oramphenico. ace.y. fiansferase, P - g .ucuro„idase, and so forth. 
The' reporter pro.ein can produce or modu.a.e fight, e.g., a fluorescent prt,ein (e.g green 
fluorescent protein, variants .hereof, red fluorescent pro,ein, variants thereof, and .he 
like), and luciferase. 

The .ranscription promo.er can be a prokaryotic promoter, a eukaryonc promoter, 
or a vira. promoter, fit a preferred embodimen, me promoter is me T7 RNA po.ymerase 
promote, The regulatory eomponen«s, e.g., .he fianscripfion ptomoter, can vary among 
Lleic acids a. different addresses of the plurality. For examp.e, dffleren. promoters can 
be used to vary .he amoun. of po.ypep.ide produced a. differen. addresses. 

to one embodimen, .he nucleic acid also includes a. leas, one sue for 
recombination, e.g., homologous recombinafion or site-specif, recomb.na.ion e.g., a 
,ambda « site or varian. thereof; a .ox site; or a FLP site, to a preferred embodunen, 
toe recomb.na.ion site lacks stop codons in toe reading frame of a nucleic acd encodmg a 
res. amino acd sequence, to anotoer preferred embodimen, toe recombinafion arte 
inctodes a stop codon in .he reading frame of a nuc.eic acid encoding a test ammo acd 
sequence. 
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slte e . a protease site, e.g., a site cleaved by a si.e-specific protease (e.g a thromb.n 
2 a^ldnase si«, a scission site, a factor Xa site, or a TEV site,, or a — 

bromide) or a proline (cleavage by formic add)). 

dfij ,o the first tag The second «( can be C-terntina, to the test amino acrd sequence 

" se^ce; the second tag and the firs, tag can be ad.acen, to one another 
rilya.itfiteraet.uence.bofi.beingN— ore— detest ammo 

same or different torn the first tag. In another embodiment, the second tag 

^.Jtideatanaddres, Preferab^erecogmfion.ghaaa^uenceoth^an 
IZel fthe affinitytag. b s,in another embodtment, a pinrahty of poi^t-de 
7 ^ than 3 4 5 about 10, or abon, 20 tags) are encoded in additton to the firs. 

f,rS ' ^The nueieic acid sequence can further inciude an identtfier sequence, e.g a non- 
codi „g nudeic acid sequence, e.g., one that is synthetic* inserted, and aUows for 

t0 ,00, 10 to 50, or abon. 10 to 30 nncleo..des tn lengm. The tdenufier c 

« ,, is no. complement or identica, to another identifier or any regton of each 

nucleic acid sequence of the plurality on the array. 

can be a naturally-occurring intern or a mutated mtem. 

Tfce nucleic acid sequences encoding the test amino ac.d sequences can e 
ootained from a collection of full-length expressed genes (e.g., a reposnory of Cones), 
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cDNA library, or a genomic Hbraty. The test amino acid sequences can be genes 
expressed in a issue, e.g., a norma, or diseased tissue. The res. polypepu es can be 
muta „ts or variant of a scaffold protein (e.g., an antibody, zincfrnger, po.ypept.de 
hormone etc.). In yet anolher embodimen,, ,he lest polypeptides are random ammo acrd 
fences, patterned amino acids sequences, or designed amino acids sequences (e.g., 

second source. For example, me test amino aerd sequences on half toe addresses of an 
m ay are from a drseased tissue or a first species, whereas the sequences on the remarmng 
half are from a normal tissue or a second species. 

The handle can be attached to fire substrate. For example, the substrate can be 
Cerivatized and the handle covalen, attached thereto. The handle can be attached vra a 
bridging moiety, e.g„ a specrfic brnding pair, (e.g., -he snbsfrate contours a fits. « mb, 

pair the second member being attached to the substrate). 

to yet another embodiment, an insoluble substrate (e.g„ ahead or parfcle), ,s 
» dlspos ed a, each address of the plurality, and toe handle is attached ,0 toe insoluble 
S substrate. 

™ eg areferencetotheaddressonwhichitisdisposed. The insoluble substrate can be 
togged using a chemical tag, or an electronic tag (e.g., a transponder, The msoluble 
subsMtecarrbedisposedsuchtoariteanberemovedforlateranalysrs. 

The invention also features a kit which includes: (1) an array compnsmg a 
ptarafity of addresses, wherein each address of toe plurality comprises a hand.e and (2) a 
vector nucleic acid comprising (i, a promoter; (ii) an entry site; and (iii) a tag encodmg 
sequence, wherein the tag can be attached to the handle. 

The vector nucleic acid can include one or more sues for insertion of a test ammo 
acid sequence (e.g., a recombination sire or a restriction site), and a sequence encodmg an 
tag h, a preferred embodiment, the vector nucleic acid has two sites for msertton, and 
L gene inserted between toe two she, to another embodiment, toe sites for _ 
are homologous recombination or site-specific recombination sites, e.g., a lambda « 
„ r variant thereof; a .ox site; or a FLP site. In a preferred embodiment, one or both 
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recombination sites lack stop codons in the reading frame of a nucleic acid encoding a 
test amino acid sequence. In another preferred embodiment, one or both recombination 
sites include a stop codon in the reading frame of a nucleic acid encoding a test ammo 
acid sequence. 

In a much preferred embodiment, the tag is in frame with the translation frame of 
a nucleic acid sequence (e.g., a sequence to be inserted) encoding a test amino acid 
sequence. In a preferred embodiment, the tag is fused directly to the test amino acid 
sequence, e.g., directly amino-terminal, or directly carboxy-terminal. In another 
preferred embodiment, the tag is separated from the test amino acid by one or more linker 
amino acids, e.g., 1, 2, 3, 4, 5, 6, 8, 10, 12, 20, 30 or more amino acids, preferably about 
1 to 20 or about 3 to 12 amino acids. The linker amino acids can include a cleavage site, 
flexible amino acids (e.g., glycine, alanine, or serine, preferably glycine), and/or polar 
amino acids. The linker and tag can be amino-terminal or carboxy-terminal to the test 
amino acid sequence. The cleavage site can be a protease site, e.g., a site cleaved by a 
site-specific protease (e.g., a thrombin site, an enterokinase site, a PreScission site, a 
factor Xa site, or a TEV site), or a chemical cleavage site (e.g., a methionine, preferably a 
unique methionine (cleavage by cyanogen bromide) or a proline (cleavage by formic 
acid)). 

In one embodiment, the handle includes a keto group, and the tag is a hydrazine. 
A covalent bond is formed between the handle and tag. The kit can further include an 
unnatural amino acid having a keto group, e.g., a reactable keto group on a side chain. 
The kit can also further include a tRNA, and optionally a tRNA synthetase for amino- 
acylating the tRNA with the unnatural amino acid. The tRNA can be a stop codon 
suppressing tRNA. 

In a preferred embodiment, the kit also includes at least a second vector nucleic 
acid. The second vector nucleic acid can include one or more sites for insertion of a test 
amino acid sequence (e.g., a recombination site or a restriction site). 

In another embodiment, the kit also includes multiple nucleic acids encoding 
unique test amino acid sequences. These encoding nucleic acids can be flanked, e.g., on 
both ends by a site, e.g., a site compatible with the vector nucleic acid (e.g., having 
sequence for recombination with a sequence in the vector, or having a restriction site 
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which leaves an overhang or blunt end such that the overhang or blunt end can be ligated 
into the vector nucleic acid (e.g., the restricted vector nucleic acid)). 

In another preferred embodiment, the kit also includes a transcription effector 

and/or a translation effector. 

In a preferred embodiment, the second vector nucleic acid has a recognition tag, 
e.g., an epitope tag, an enzyme, a fluorescent protein (e.g., GFP, BFP, variants thereof). 

The first and/or second vector nucleic acid can further include one or more of: a 
transcription promoter; a transcription regulatory sequence; a untranslated leader 
sequence; a sequence encoding a cleavage site; a recombination site; a 3' untranslated 
M , sequence; a transcriptional terminator; and an internal ribosome entry site. In one 
O embodiment, the nucleic acid sequence includes a plurality of cistrons (also termed "open 
IH reading frames"), e.g., the sequence is dicistronic or polycistronic. In another 
5jD embodiment, the nucleic acid also includes a sequence encoding a reporter protein, e.g., a 
W protein whose abundance can be quantitated and can provide an indication of the quantity 
[ U of test polypeptide fixed to the plate. The reporter protein can be attached to the test 

polypeptide, e.g., covalently attached, e.g., attached as a translational fusion. The 
rij reporter protein can be an enzyme, e.g., p-galactosidase, chloramphenicol acetyl 
0 transferase, p -glucuronidase, and so forth. The reporter protein can produce or modulate 
light, e.g., a fluorescent protein (e.g., green fluorescent protein, variants thereof, red 
fluorescent protein, variants thereof, and the like), and luciferase. 

The transcription promoter can be a prokaryotic promoter, a eukaryotic promoter, 
or a viral promoter. In a preferred embodiment, the promoter is the T7 RNA polymerase 
promoter. 

In a preferred embodiment, the kit also includes a recombinase, a ligase, and/or a 
restriction endonuclease. For example, the recombinase can mediate recombination, e.g., 
site-specific recombination or homologous recombination, between a recombination site 
on the test nucleic acid and a recombination sequence on the vector nucleic acid. For 
example, the recombinase can be lambda integrase, HIV integrase, Cre, or FLP 
recombinase. 

In a preferred embodiment, each address of the plurality has a handle capable of 
recognizing the tag. The handle can be attached to the substrate. For example, the 
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substrate can be derivatized and the handle covalent attached thereto. The handle can be 
attached via a bridging moiety, e.g., a specific binding pair, (e.g., the substrate contains a 
first member of a specific binding pair, and the handle is linked to the second member of 
the binding pair, the second member being attached to the substrate). 

In yet another embodiment, the array of the kit includes an insoluble substrate 
(e.g., a bead or particle), disposed at each address of the plurality, and the handle is 
attached to the insoluble substrate. The insoluble substrate can further contain 
information encoding its identity, e.g., a reference to the address on which it is disposed. 
The insoluble substrate can be tagged using a chemical tag, or an electronic tag (e.g., a 

H . transponder). The insoluble substrate can be disposed such that it can be removed for 

p later analysis. 

\J\ The first or second vector nucleic acid can include a sequence encoding a second 

!pj polypeptide tag in addition to the tag. The second tag can be C-terminal to the test amino 
Ui acid sequence and the tag can be N-terminal to the test amino acid sequence; the second 
!" tag can be N-terminal to the test amino acid sequence, and the tag can be C-terminal to 
fl the test amino acid sequence; the second tag and the tag can be adjacent to one another, 
Hi or separated by a linker sequence, both being N-terminal or C-terminal to the test amino 
O acid sequence. In one embodiment, the second tag is an additional tag, e.g., the same or 
different from the first tag. In another embodiment, the second tag is a recognition tag. 
For example, the recognition tag can report the presence and/or amount of test 
polypeptide at an address. Preferably the recognition tag has a sequence other than the 
sequence of the tag. In still another embodiment, a plurality of polypeptide tags (e.g., 
less than 3, 4, 5, about 10, or about 20 tags) are encoded in addition to the first tag. Each 
polypeptide tag of the plurality can be the same as or different from the first tag. 

The first or second vector nucleic acid sequence can further include a sequence 
encoding a protein splicing sequence or intein. The intein can be inserted in the middle 
of a test amino acid sequence. The intein can be a naturally-occurring intein or a mutated 
intein. 

The nucleic acids encoding the test amino acid sequences can be obtained from a 
collection of full-length expressed genes (e.g., a repository of clones), a cDNA library, or 
a genomic library. The encoding nucleic acids can be nucleic acids (e.g., an mRNA or 



64 



Attorney 



4BttNo: 00246-260001/H1803 



,• tissue ee a normal or diseased tissue. The test polypeptides (i.e., 
cDNA) expressed in a tissue, e.g., a iiuima 

an ,ibody,zinc-f 1 „ger,po 1 ypepMehom,onee,c.). In yet mother 
po.ypeptides are random amino acid seances, patterned ammo ae ds 
LiL amino acids seances (e.g., science designed by manual, rattona,, o 

ptaLy from a firs, source, and plurality from a second souree. For example, 
I.peeies.lreas.bese.ueneeson.heremaimngha.fare.omanormainssneora 

S or a computer readable medium (e.g., a CD-ROM, a magnetic disc, flash memory Each 

fU encoding nnc.eic acid sequence in the kit, e.g., location in a nucrotttre plate. Op o„^,y, 

K resulOotdetect.ngmepo.ypepfldeencodedhythenncle.cacd^nence. Thed^e 
S can! eludearecordforeaehaddresaofmepluralUypresentonfltea^y. "Hter cords 
K Lcmsteredorhaveareferenee t „„ ra er rc c.m S (e.,,me,„dmgme^,ca. 

^ „„„ the result The software can contain computer readable code to 

""rimclude — .ruseofthearrayoral.hKorlnd.cat.onof 
a „e W or,cresouree ( e,.,a W eb,,e,having i nsnuc,,o„s f oru,eo f me»ayormea^ 

database of records describing the addresses of the array. 

database^ _ ^ ^ ^ ^ klt> and 

p ta ra,„y of nuclerc 1 seances, each encoding a uni q ue tes, amino acl se^encetmd 

^-^^^ 
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nucleic acid .0 thereby general a test nucleic acid sequence encoding a test polypeptide 
comprising the test amino acid sequence and the tag; and disposing each of the plurahty 
of test nucleic acid sequences at an address of the array. 

Another featured kit includes: an array comprising a substrate having a plurahty 
of addresses, wherein each address of the plurahty comprises a handle, and a nucleic actd 
sequence encoding an amino acid sequence comprising: (a) a test amino acid sequence, 
and (b) a tag. The kit can optionally further include at least one of: a translation effector 

and a transcription effector. 

The nucleic acid can be a RNA, or a DNA (e.g., a single-stranded DNA, or a 
u double stranded DNA). In a preferred embodiment, the nucleic acid includes a plasmid 
O DNA or a fragment thereof; an amplification product (e.g., a product generated by RCA, 
LH PCR, NASBA); or a synthetic DNA. 

? The nucleic acid can further include one or more of: a transcription promoter; a 

E transcription regulatory sequence; a untranslated leader sequence; a sequence encodmg a 
I" cleavage site; a recombination site; a 3' untranslated sequence; a transcriptional 
g terminator; and an internal ribosome entry site. In one embodiment, the nucleic acid 
m sequence includes a plurality of cistrons (also termed "open reading frames"), e.g., the 
S sequenceisdicistronicorpolycistronic. In another embodiment, the nucleic acid also 
fU includesasequenceencodingareporterprotein,e.g.,aproteinwhoseabundancecanbe 

quantitated and can provide an indication of the quantity of test polypeptide fixed to the 
plate The reporter protein can be attached to the test polypeptide, e.g., covalently 
attached, e.g., attached as a translation^ fusion. The reporter protein can be an enzyme, 
e g p-galactosidase, chloramphenicol acetyl transferase, ^glucuronidase, and so forth. 
The reporter protein can produce or modulate light, e.g., a fluorescent protein (e.g., green 
fluorescent protein, variants thereof, red fluorescent protein, variants thereof, and the 
like), and luciferase. 

The transcription promoter can be a prokaryotic promoter, a eukaryotic promoter, 
or a viral promoter. In a preferred embodiment, .he promoter is the T7 RNA polymerase 
promoter. The regulatory components, e.g., the transcription promoter, can vary among 
nuclerc acids a. different addresses of the plurality. For example, different promoters can 
be used to vary the amount of polypeptide produced at different addresses. 
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In one embodiment, the nucleic acid also includes at least one site for 
recombination, e.g., homologous recombination or site-specific recombination, e.g., a 
lambda att site or variant thereof; a lox site; or a FLP site. In a preferred embodiment, 
the recombination site lacks stop codons in the reading frame of a nucleic acid encoding a 
test amino acid sequence. In another preferred embodiment, the recombination site 
includes a stop codon in the reading frame of a nucleic acid encoding a test amino acid 
sequence. 

In another embodiment, the nucleic acid includes a sequence encoding a cleavage 
site, e.g., a protease site, e.g., a site cleaved by a site-specific protease (e.g., a thrombin 
. A site, an enterokinase site, a PreScission site, a factor Xa site, or a TEV site), or a chemical 

0 cleavage site (e.g., a methionine, preferably a unique methionine (cleavage by cyanogen 

O 

in bromide) or a proline (cleavage by formic acid)). 

^ I In a preferred embodiment, each test amino acid sequence in the plurality of 

W addresses is unique. For example, a test amino acid sequence can differ from all other 

fjj 

■ test amino acid sequence of the plurality by 1, or more amino acid differences, (e.g., 

P about 2, 3, 4, 5, 8, 16, 32, 64 or more differences; and, by way of example, has about 800, 

fU 256, 128, 64, or 32, 16, 8, 4, or fewer differences). In another preferred embodiment, the 

PLI ' ' 

O test amino acid sequence encoded by the nucleic acid at each address of the plurality is 
^ identical to all other test amino acid sequences in the plurality of addresses. In a 

preferred embodiment, the affinity tag encoded by the nucleic acid at each address of the 
plurality is the same, or substantially identical to all other affinity tags in the plurality of 
addresses. In another preferred embodiment, the nucleic acid at each address of the 
plurality encodes more than one affinity tag. In yet another preferred embodiment, the 
affinity tag encoded by the nucleic acid at an address of the plurality differs from at least 
one other affinity tag in the plurality of addresses. 

In a preferred embodiment, the affinity tag is fused directly to the test amino acid 
sequence, e.g., directly ammo-terminal, or directly carboxy-terminal. In another 
preferred embodiment, the affinity tag is separated from the test amino acid by one or 
more linker amino acids, e.g., 1, 2, 3, 4, 5, 6, 8, 10, 12, 20, 30 or more amino acids, 
preferably about 1 to 20, or about 3 to 12 amino acids. The linker amino acids can 
include a cleavage site, flexible amino acids (e.g., glycine, alanine, or serine, preferably 
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gly ci„e), and/or polar amino acids. The ft*, and affinity tag can be amino— o, 
carboxy-terminal to the test amino acid sequence. 

The nudeic acid can inch.de a sequence encoding a second po.ypept.de tag tn 
addition ,o the afftnity tag. The second tag can he Otermina, ,0 the test amino actd 
sequence and the affinity tag can he N-terminal to the test ammo acd sequenc ; the 

LrminHothetes, ammo acd sequence; the second tag and the affitmty tag can he 

rermina, to the test amino acid sequence. m one embodiment, the second tag ts au 
M , addition, affinity tag, e.g., the same or different from the firs, tag. fa another 
5 embodimenuheseeondtagisarecognidonrag. Per exampie, the re^gnttt n ug can 
1 .eportthepresenceand/oramountoftestpoiypeptideatanaddres, 
l 2 Ignition tag has a sequence other than ft. sequence of theaffimty tag. m a..., an titer 

? ais) are encoded in addition to the first affinity tag. Each po.ypept.de tag of the p.urahty 
S can be the same as or different from the first affinity tag. 

Hi ^ nucleic acid sequence can further include an identifier sequence, e.g. a non- 

coding nucleic acid sequence, e.g., one that is synthetically inserted, and adows for 

in ,e„g,h to unique,y identify each sconce in the plurality; e.g., •« ts about 5 to 00 
12 ,0,o K I about,0,o30nuc,eotides,„ length. The identifier can be selected so 
fta. it is no, complementary or identica! ,0 another identifier or any regton of each 
nucleicaeidsequenceoftheplurahtyonthearray. 

The nudeic acid sequence can further include a sequence encodmg a protetn 

sequence. The lutein can be a naturaUy-occurring intern or a mutated rntetn. 

The nucleic acids encoding the test amino acid sequences can be obtamed from a 

a genomic library. The encoding nudeic acids can be nucle,c acrds (e.g„ an mRNA 
« amino acid sciences) can be mutants or variants of a scaffold protem (e.g., an 
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antibody, zinc-finger, polypeptide hormone etc.). In ye. mother embodiment, the test 
polypeptides are random ammo aeid seqnenees, patterned amino acids seqnences, or 
designed amino acids seqnences (e.g., sequence designer! by manual, rattonal, or 
computer-aided approaches). The plurality of test amino acid sequences can tnc.ude a 
plurality from a first source, and plurality from a second source. For example, the test 
amino acid sequences on half the addresses of an array are from a diseased ttssue or a 
first species, whereas the sequences on the remaining half are from a normal tissue or a 
second species. 

In a preferred embodiment, each address of the plurality further includes one or 
more second nucleic acids, e.g., a plurality of unique nucleic acids. Hence, the plummy 
in toto can encode a plurality of test sequences. For example, each address of the 
phtrality can encode a pool of test polypeptide sequences, e.g., a subset of a hhrary or 
clone bank. A second array can be provided in which each address of the pluraltty of the 
second array includes a single or subset of members of the pool present at an address of 
the first array. The first and .he second army can be used consecutwely. 

In other preferred embodiments, each address of Ihe plurality further includes a 
second nucleic acid encoding a second amino acid sequence. 

In one preferred embodiment, each address of the plurality includes a first test 
amino acid sequence that is common to all addresses of the plurality, and a second test 
amino acid sequence that is unique among all the addresses of the plurality. For example, 
the second test amino acid sequences can be query sequences whereas the firs, ammo test 
amino acid sequence can be a .arge. sequence. In another preferred embodiment, each 
address of .he p.urali.y includes a firs, test amino acid sequence that is unique among all 
the addresses of the plurality, and a second test amino acid sequence that is common to 
al, addresses of the plurality. For example, Are firs, test amino acid sequences can be 
query sequences whereas the second amino test amino acid sequence can be a target 
sequence. The second nucleic acid encoding the second test amino acid sequence can 
include a sequence encoding a recognition tag and/or an affinity tag. 

At at least one address of the plurality, the first and second amino actd sequences 
can be such that they interact with one another. In one preferred embodiment, they are 
capable of binding to each outer. The second test amino acid sequence is opttonally 
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fused to a detectable amino acid sequence, e.g., an epitope tag, an enzyme, a fluorescent 
protein (e.g., GFP, BFP, variants thereof). The second test amino acid sequence can be 
itself detectable (e.g., an antibody is available which specifically recognizes it). In 
another preferred embodiment, one is capable of modifying the other (e.g., making or 
breaking a bond, preferably a covalent bond, of the other). For example, the first amino 
acid sequence is kinase capable of phosphorylating the second amino acid sequence; the 
first is a methylase capable of methylating the second; the first is a ubiquitin ligase 
capable of ubiquitinating the second; the first is a protease capable of cleaving the 
second; and so forth. 

Kits of these embodiments can be used to identify an interaction or to identify a 
compound that modulates, e.g., inhibits or enhances, an interaction. 

The binding agent can be attached to the substrate. For example, the substrate can 
be derivatized and the binding agent covalent attached thereto. The binding agent can be 
attached via a bridging moiety, e.g., a specific binding pair, (e.g., the substrate contains a 
first member of a specific binding pair, and the binding agent is linked to the second 
member of the binding pair, the second member being attached to the substrate). 

In yet another embodiment, an insoluble substrate (e.g., a bead or particle), is 
disposed at each address of the plurality, and the binding agent is attached to the 
insoluble substrate. The insoluble substrate can further contain information encoding its 
identity, e.g., a reference to the address on which it is disposed. The insoluble substrate 
can be tagged using a chemical tag, or an electronic tag (e.g., a transponder). The 
insoluble substrate can be disposed such that it can be removed for later analysis. 

The kit can further include a database, e.g., in computer memory or a computer 
readable medium (e.g., a CD-ROM, a magnetic disc, flash memory. Each record of the 
database can include a field for the amino acid sequence encoded by the nucleic acid 
sequence and a descriptor or reference for the physical location of the nucleic acid 
sequence on the array. Optionally, the record also includes a field representing a result 
(e.g., a qualitative or quantitative result) of detecting the polypeptide encoded by the 
nucleic acid sequence. The database can include a record for each address of the 
plurality present on the array. The records can be clustered or have a reference to other 
records (e.g., including hierarchical groupings) based on the result. 
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The kit can also include instructions for use of the array or a link or indication of 
a network resource (e.g., a web site) having instructions for use of the array or the above 
database of records describing the addresses of the array. 

In another aspect, the invention features a method of providing an array across a 
network, e.g., a computer network, or a telecommunications network. The method 
includes: providing a substrate comprising a plurality of addresses, each address of the 
plurality having a binding agent; providing a plurality of nucleic acid sequences, each 
nucleic acid sequence comprising a sequence encoding a test amino acid sequence and an 
affinity tag that is recognized by the binding agent; providing on a server a list of either 
(i) nucleic acid sequences of the plurality or (ii) subsets of the plurality (e.g., categonzed 
Q groups of sequences); transmitting the list across a network to a user; receiving at least 

51 one selection of the list from the user; disposing the one or more nucleic acid sequence 

S corresponding to the selection on an address of the plurality; and providing the substrate 

W to the user. 

[ U In one embodiment, each nucleic acid sequence is disposed at a unique address. 

P For example, if a subset is selected, each nucleic acid sequence of the subset is disposed 

rU a t a unique address. In another embodiment, a plurality of nucleic acid sequences are 

£"i disposed at each address. 

r5 The method can further include contacting each address of the plurality with one 

or more of (i) a transcription effector, and (ii) a translation effector. Optionally, the 
substrate is maintained under conditions permissive for the amino acid sequence to bind 
the binding agent. One or more addresses can then be washed, e.g., to remove at least 
one of (i) the nucleic acid, (ii) the transcription effector, (Hi) the translation effector, 
and/or (iv) an unwanted polypeptide, e.g., an unbound polypeptide or unfolded 
polypeptide. The array can optionally be contacted with a compound, e.g., a chaperone; a 
protease; a protein-modifying enzyme; a small molecule, e.g., a small organic compound 
(e.g., of molecular weight less than 5000, 3000, 1000, 700, 500, or 300 Daltons); nucleic 
acids; or other complex macromolecules e.g., complex sugars, lipids, or matrix 
molecules. 

The array can be further processed, e.g., prepared for storage. It can be enclosed 
in a package, e.g., an air- or water-resistant package. The array can be desiccated, 
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frozen, or contacted with a storage agent (e.g., a cryoprotectant, an anti-bacterial, an anti- 
fungal). For example, an array can be rapidly frozen after being optionally contacted 
with a cryoprotectant. This step can be done at any point in the process (e.g., before or 
after contacting the array with an RNA polymerase; before or after contacting the array 
with a translation effector; or before or after washing the array). The packaged product 
can be supplied to a user with or without additional contents, e.g., a transcription effector, 
a translation effector, a vector nucleic acid, an antibody, and so forth. 

In a preferred embodiment, each test amino acid sequence in the plurality of 
addresses is unique. For example, a test amino acid sequence can differ from all other 
test amino acid sequence of the plurality by 1, or more amino acid differences, (e.g., 
O about 2, 3, 4, 5, 8, 1 6, 32, 64 or more differences; and, by way of example, has about 800, 

j?j 256, 128, 64, or 32, 16, 8, 4, or fewer differences). In another preferred embodiment, the 

test amino acid sequence encoded by the nucleic acid at each address of the plurality is 
S identical to all other test amino acid sequences in the plurality of addresses. In a 

[ U preferred embodiment, the affinity tag encoded by the nucleic acid at each address of the 

P plurality is the same, or substantially identical to all other affinity tags in the plurality of 

fU addresses. In another preferred embodiment, the nucleic acid at each address of the 

Pj plurality encodes more than one affinity tag. In yet another preferred embodiment, the 

R affinity tag encoded by the nucleic acid at an address of the plurality differs from at least 

one other affinity tag in the plurality of addresses. 

In a preferred embodiment, the affinity tag is fused directly to the test amino acid 
sequence, e.g., directly amino-terminal, or directly carboxy-terminal. In another 
preferred embodiment, the affinity tag is separated from the test amino acid by one or 
more linker amino acids, e.g., 1, 2, 3, 4, 5, 6, 8, 10, 12, 20, 30 or more amino acids, 
preferably about 1 to 20, or about 3 to 12 amino acids. The linker amino acids can 
include a cleavage site, flexible amino acids (e.g., glycine, alanine, or serine, preferably 
glycine), and/or polar amino acids. The linker and affinity tag can be amino-terminal or 
carboxy-terminal to the test amino acid sequence. 

The nucleic acid can be a RNA, or a DNA (e.g., a single-stranded DNA, or a 
double stranded DNA). In a preferred embodiment, the nucleic acid includes a plasmid 
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DNA or a fragment .hereof; an amp.if-ea.ion produet (e.g., a produe. generated by RCA, 

PCR NASBA); or a synthetic DNA. 

' The nueleie add ean further ine.ude one or more of: a <ranserip,ion promo.er ; a 
nation re g u.a.ory sequence; a un.ransia.ed leader sequence; a sequence encodmg a 
deavage si.e; a recombination she; a 3' umranslated sequence; a transcnpttona. 
terminator; and an interna, ribosome entry site, .n one embod.men, .he nude* act 
sequence includes ap.nra.ity of cisrtons (aho .ermed "open reading frames ), e.g., me 
sequence is dicsteonic or po.ycisfronic. In another embodimen, me nucle.c acd also 
tac ,udes a sequence encoding a reporter pro* e.g., a protein whose abundanc e can be 
q nan«„a,ed and can provide an indication of me quantity of tea. po.ypept.de xe^ .0 
p,a,e The reporter pro.ein can be artached to .he ,es, polypeptide, e.g., cova.en.ly 
attached, e.g., artached as a rtans.ationa, fusion. The reporter protein can be an enzyme, 
e g p-ga.ac.osidase, chloramphenicol acety. transferase, p-glucuronidase, and so forth. 
The' reporter protein can produce or modulate light, e.g., a fluoresce™ protein (e.g., green 
fluoresce,, pro.ein, varian.s .hereof, red fluoresced protein, variants .hereof, and me 

like), and luciferase. m 

The transcription promoter can be a prokaryotic promoter, a eu*aryo.,c promoter, 
or a viral promoter. In a preferred embodimen, .he promoter is .he T7 RNA polype 
promoter. The regola.ory c„mpone».s, e.g., .he transcription promoter, can vary among 
nucleic adds a. different addresses of .he p.urali.y. For example, different promote, can 
be used to vary .he amoun. of polypeptide produced a. differen. addresses. 

in one embodimen, me nucleic acid also includes a. .eas, one arte for 
recomh.na.ion, e.g., homologous recombinalion or si«e-specif.c recombinarton, e.g., a 
.ambda art site or variant thereof, a ,ox site; or a FLP she. In a preferred em.—, 
Ore recombination si.e lacks s.op codons in .he reading frame of a nucleic ac.d encodmg a 
,es, ammo acid sequence. In another preferred embodimen, me recombmahon arte 
includes a stop codon in the reading frame of a nucleic acid encodmg a .est ammo ae.d 

SeqUe " t another embodimen, me nudeic acid includes a sequence encoding a cleavage 
site e g„ a protease site, e.g., a site cleaved by a si.e-speciflc protease (e.g., a thrombm 
Sltt an enterokmase she, a PreScission site, a facte, Xa site, or a TEV site,, or a chem,ca, 
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bromide) or a proline (cleavage by formic acid)). 

sequence and the affinity tag can be N-tentrinal ,0 the test anttno ae,d sequence, the 
I IgcanbeN-.etmtnal ,„ «ne test antino acid seanenee, and the affintty . g can oe 

l,naHo t hetes t an 1 i„oaeM S e,ne„ce. 1 no„ee m hod im en t , t he S econd,ssan 
a ddiUona 1 af„ni,y t a g ,e. g .,«hesanteordifferen,fro mt hef„,.a 8 . Inanother 
IdintCthesecondtasisateeognittontas. sample ^«^T 

cognition .aghas a sequence other than the sequence of the affint« tag 
JL-M P-Hty of polypeptide tags (e.g., less than 3 4, 5, " ^ 

a. encoded ,n addition to the fit, affinity tag. Each po.ypept.de tag of the plurahty 
cau be the same as or different front the first affinity tag. 

The nucleic acid sequence can further include an identifier sequence, e.g a non- 
coding nucleic actd sequence, e.g., one that is synthetic* inserted, and afiows for 

in 1,1 ,0 unique.y identify each sequence in the plurality; e.g., « . about 5 to 00 

tot „ is no, complementary or identical to auodter idenfifier or any reg,on of each 
nucleic acid sequence of the plurality on the array. 

can be a naturally-occurring intein or a mulaled tnlein. 

rnenuclLacidsequenceaofthep.ura.i.y-heoh^edfi.nracol.ecfronof 

or aisled tissue. The test po.ypeptides can be mutants or vanants of a scaffo. pro, 
ordiseaseonsa v ta ye , another embodiment, 

(eg, an antibody, zinc-finger, polypepttde hormone etc.). n y 
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the test polypeptides are random amino acid sequences, patterned amino acids sequences, 
or designed amino acids sequences (e.g., sequence designed by manual, rational, or 
computer-aided approaches). The plurality of test amino acid sequences can include a 
plurality from a first source, and plurality from a second source. For example, the server 
can be provided with lists of test amino acid sequences associated with a diseased tissue 
or a first species in addition to lists of test amino acid sequences associated with a normal 

tissue or a second species. 

The binding agent can be attached to the substrate. For example, the substrate can 
be derivatized and the binding agent covalent attached thereto. The binding agent can be 
attached via a bridging moiety, e.g., a specific binding pair, (e.g., the substrate contains a 
fa first member of a specific binding pair, and the binding agent is linked to the second 

J,f! member of the binding pair, the second member being attached to the substrate). 

*2 In yet another embodiment, an insoluble substrate (e.g., a bead or particle), is 

W disposed at each address of the plurality, and the binding agent is attached to the 

[° l insoluble substrate. The insoluble substrate can further contain information encoding its 

P identity, e.g., a reference to the address on which it is disposed. The insoluble substrate 

M can be tagged using a chemical tag, or an electronic tag (e.g., a transponder). The 

f! j 

p insoluble substrate can be disposed such that it can be removed for later analysis. 

IV The invention also features a computer system including (i) a server storing a list 

of amino acid sequences and/or their descriptors, and (ii) software configured to: (1) send 
a list of amino acid sequence and/or their descriptors to a client; (2) receive from the 
client a plurality of selected amino acid sequences from the list ; and (3) interface with an 
array provider (e.g., a robotic system, or a technician) so as to dispose on a substrate 
nucleic acids encoding the selected amino acid sequences, each at a plurality of 
addresses. 

The invention also features a method of identifying a small molecule or drug 
binding protein. Such proteins can include drug targets and adventitious drug-binding 
proteins (e.g., non-target proteins responsible for toxicity of a drug). The method 
includes providing or obtaining an array described herein, contacting each address of the 
plurality with a drug, e.g., a labeled drug. The method can further include detecting the 
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Qten ee prior to the detecting. 

described herein. The term encompasses such an array a, any stages of producfion, e.g., 
^ynuc.cicac.orpo.y.ephdeUpresen.whennucWcacidisd^on.he 

ly.hu.nopo.ypepdde.prese^whenanuc.eicacidhasheenrenrovedanda 
substrate. Thns, a reagen, a, a firs, address can he position* ***** from a 

dimensional arrays. A ^cwltna 

The term "substrate," as nsed herein in the context of arrays (as opposed .o 
substrate of an enzyme), refers to a composition in or on which a nucleic actd or 
^ttlisdispl. The — maybedtsconttnuous. An illustrative case of a 
discontinuous substrate is a se, of gel pads separated by a parbtton 

The terms "lest ammo acid science" or "test polypepfide as used heifers 
* a polypeptide of a, .east three amino acids that is traced on the array. The 
Joalse^encemay or may no. vary among.be addressesoftbearray 

The term Wauon effec.of refers .o a macromolecule capah.e of decodmg 

a translation extract obtained from a cell. 

As used herein, fine .erm "transcnp.ion effect refers .0 a compost-on capable 
„ f syn,bes i aingRNAfiomanRNA < ,rDNA.emp,a.e,e.g., aR NApo,yn,erase. 
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The term "recognizes," as used herein, refers to the ability of a first agent to bind 
to a second agent. Preferably, the dissociation constant or apparent dissociation constant 
of binding is about 100 \iM, 10\M,\ uM, 100 nM, 10 nM, 1 nM, 100 pM, 10 pM, or 
less. 

The term "affinity tag," as used herein, refers to an amino acid, a peptide 
sequence, or a polypeptide sequence that includes a moiety capable of recognizing or 
reacting with a binding agent. 

The term "binding agent," as used herein, refers to a moiety, either a biological 
polymer (e.g., polypeptide, polysaccharide, or nucleic acid, or another chemical 
compound which is capable of recognizing or binding an affinity tag or which is capable 
of specifically reacting with an affinity tag, e.g., to form a covalent bond. The term 
"handle" is used synonymously with binding agent. 

The term "recognition tag," as used herein, refers to an amino acid, a peptide 
sequence, or a polypeptide sequence that can be detected, directly or indirectly, on the 
array. 

As used herein, the terms "peptide," "polypeptide," and "protein" are used 
interchangeably. Generally, these terms refer to polymers of amino acids which are at 
least three amino acids in length. 

A "unique reagent" refers to a reagent that differs from a reagent at each other 
address in a plurality of addresses. The reagent can differ from the reagents at other 
addresses in terms of one or both of: structure and function. A unique reagent can be a 
molecule, e.g., a biological macromolecule (e.g., a nucleic acid, a polypeptide, or a 
carbohydrate), a cell, or a small organic compound. In the case of biological polymers, a 
structural difference can be a difference in sequence at at least one position. In addition, 
a structural difference, e.g., for polymers having the same sequence, can be a difference 
in conformation (e.g., due to allosteric modification; meta-stable folding; alternative 
native folded states; prion or prion-like properties) or a modification (e.g., covalent and 
non-covalent modifications (e.g., a bound ligand)) 

Protein microarrays representing many different proteins, as described herein, 
provide a potent high-throughput tool which can greatly accelerate the study of protein 
function. The arrays described herein avoids the process of expressing proteins in living 
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cells, purifying, stabilizing, and spotting them. NAPPA arrays, as described herein, also 
reduce the number of manipulations for each polypeptide, as the polypeptide can be 
synthesized in situ in or on the array substrate. The current invention obviates the need to 
purify polypeptides and to manipulate purified protein samples onto the array by the 
straightforward and much simpler process of disposing nucleic acids. The nucleic acids 
are then simultaneously transcribed/translated in a cell-free system and immobilized in 
situ, minimizing direct manipulation of the proteins and making this approach well suited 
to high-throughput applications. Further, the cotranslation of a first and second 
polypeptide can enhance complex formation in some cases. 

In addition, the protein folding environment in cell free systems differs from the 

M' 

D natural environment, allowing for a user to control a variety of parameters such as post- 

Pi 

jjj translational modifications. 

U1 The array can be easily reprogrammed to contain different sets of proteins and 

r 

y polypeptides. 

Polypeptide arrays provide comprehensive genome-wide screens for biomolecular 

Q interactions. The arrays, as described herein, allow for the sampling of an entire library. 

u 

f y Detecting each address of a plurality provides the certainty that each library member has 

S been screened. Thus, complete coverage of known sequences is possible. For example, 

w 

IU a single array containing 10,000 arrayed elements, for example, can be sufficient to yield 

10,000 results (e.g., quantitative results), each result comparable with the results of other 
elements of the array, and potentially with a result from other arrays. High-density arrays 
further expand possible coverage. 

Some embodiments described herein also provide arrays and methods for 
detecting subtle and sensitive results. As a polypeptide species, e.g., a homogenous 
species, can be provided at an address without competing species, a result for the 
individual species can be detected. In other embodiments, arrays and methods can also 
including competing species for the very purpose of removing subtle results and 
increasing the signal of strong positives. 

In sum, the arrays and methods described herein provide a versatile new platform 
for proteomics. 
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DESCRIPTION OF THE DRAWINGS 

Figure 1 is a schematic representation of use of a nucleic acid programmable 
array (NAPPA) for screening protein-protein interactions. (A) Plasmids encoding target 
proteins fused with an affinity tag such as GST and query proteins fused with a reporter 
,ag such as GFP are deposited in wells derivatized with affinity acceptor molecules such 
as glutathione. The fusion proteins are tmnscribed/trans.ated in a cel.-ffee express.on 
system. (B) Target proteins are immobilized in the wells, as are query proteins that bmd 
,„ target. The wells are then washed to remove unbound protein. (C) Target-query 
complexes are detected by fluorescence spectroscopy. 

Figure 2 is a Western blot of an SDS-page gel of in vitro translated GST and GFP 
fusion proteins. GST- and GFP-p21 fusion proteins were transcribed and translated m a 
ceU-ftee reticulocyte lysate. Proteins were separated by SDS-PAGE and analyzed by 
Western blotting with antibodies specific for p2f. Sizes of molecn.ar weight markers are 
shown on the left. 

Figure 3 is a schematic of a computer network for providing NAPPA tmcroanays. 
Figure 4 is a schematic of a computer network for providing diagnostic servtces. 
Figure 5 is a picture of a NAPPA array. GST is used as an affinity tag and ,s 
fused to test amino acid sequences such as p21. Cell-free (reticulocyte lysate) 
transcription/transtation and detection of fusion proteins in wells coated with anttbodtes 
specific for GST. Coupled ascription/translation for 1.5 hrs at 30°C was followed by 
swirling for 1 .5 hrs at 22°C to allow GST-fusion proteins to bind to the wells. A wash 
w,th PBS was followed by the detection of immobilized proteins with a p21-spec,fic 
primary antibody and a mouse-specific HRP-conjugated secondary antibody. The wells 
containing expressed GST-fusion prateins provide the dominant signal, mdtcatmg 
specific immobilization of targe, ptoteins. Applications for using an array of antigens for 
determining the specificity of antibodtes in a subject are described, e.g., m the 
"Diagnostic Assays" section of the text. 

Figure 6 is a graph of binding of c-jun to c-fos detected on a NAPPA array. Cell- 
free (reticulocyte lysate) transcription/translation of the interacting transcription factors 
cFos-GST and cfun, cjnn only, and a no plasmid control in wells coated with antthodtes 
specific for GST. Coupled teanscription/translation for 1.5 hrs at 30°C was followed by 
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swiriing for . .5 hrs a. 22"C «o a.low cFos-GST to bind .0 .he w=U and dun to bind >o 
cFos-GST A wasb with PBS was followed by the detection of a cFosdun comp.ex wtth 
a dun-specific primary aufibudy and a mouse-specific HRP-cunjugated secondary 
antibody. Chenu.unvinescence was measured in RLU (Relative Lumtnescence Untts) by 

clearly detected above background signal. 

DETAILED DESCWPTION 

All patents and references cited herein are incorporated in their entirety by 
reference. 
Substrates 

Materia., Both solid and porous substrates are suitable for recipients for the 
encoding nucleic acids describe herein. A substrate materia, can be selected and/or 

to one embodiment, .he substrate is a solid substrate. Po.ent.ally usefifi sohd 
substrc.es include: mass spectroscopy pla.es (e.g., for MALD1), glass (,g„ 
fcnctionafized glass, a glass slide, porous si.ica«e glass, a single crystal silicon, quartz, 
rjV-transparen, quartz glass), plasucs and polymers (e.g., polystyrene, polypropylene, 
pdyvinylidene difluoride, po.y-tetrafluoroethylene, polycarbonate, PDMS, acryhc), 
metal coated substrates (e.g., gold) , silicon substrates, latex, membranes (e g 
nl <roce.,u.ose, „y.on), a glass sl.de suitable for surface p,asmon resonance (» 

to anofiter embodiment, .he substrate is porous, e.g., a gel or matnx. Potentially 
eseful porous substrates inelude: agarose gels, acrylamide gels, sintered glass dextran, 
meshed po.ymers (e.g., macroporous crosslink dextran, sephaeryl, and sepharose), and 
f rth 

Substrate Properties. The substrate can be opaque, translucent, or transparent. 
The addresses can be dtstributed, on the substrate in one dimension, e.g., a linear array; tn 
two dimensions, e.g., a planar array; or in three dimensions, e.g., a three dimenstona. 
a^y The solid substrate may be of any convenient shape or form, e.g., square, 
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shaped and attached to a means of rotation. 

,„ one embodiment, the snhstrate comams a, least 1. 10, 100, 10 . 10 10 
10 t 10 » orl 0»ormoreaddressespercnr<. The center to center distance ean be 5 mm, . 

— .»i»«»*»* i * , »»"» , .^^:r«?3* 

„„c,eic acid, in another embodiment, each address contams 100, 10 , 10 , , 
10 s orio'ormoremoleculesofthenucleicacld. 

Substrate Modification. The substrate can be modified to facthtate the stabie 

use routine methods to modify a subsume in accordance wtth fire destred apphcafion. 
The Mlowing are non-limiting examples of substrate modifications. 

A surface canbe antidated, e.g., by silylafing the substrate, e.g., w,«h 
«w treated surface can also be derivattzed with 

s„i.hasahydro X y,anamino(e.g.,alky 1 amine),carboxylgroup,N-hydroxy 

avai ,ab.e for reaction. The substrates can be derivafized w,.h a mast ,n otfler only 
derivatized limited areas; a chemical etch or UV fight can be used to retnove 

light ',„ cover or remove the derivation in the areas between spots. 

PartflooedSubs.ra.es. m one preferred embodiment, each address ts 
parfl.JI.fiom afi Cher addm.es in order .0 proven. uni,ne molecuies fiom dt.fi.smg 
72 ddresses. The following are possible marcomoleculea which mus. .mam 
r la, .be address- a tempiate nucleic acid encoding .he .est amino acd seance, 

. , • Hhosomes e g monosomes and polysomes, translating the 

amino acid sequence; nbosomes, e.g., 

mRNA; and the translated polypeptide. 
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The substrate can be partitioned, e.g., depressions, grooves, photoresist. For 
example, the substrate can be a microchip with microchannels and reservoirs etched 
therein, e.g., by photolithography. Other non-limiting examples of substrates include 
multi-welled plates, e.g., 96-, 384-, 1536-, 6144- well plates, and PDMS plates. Such 
high-density plates are commercially available, often with specific surface treatments. 
Depending on the optimal volume required for each application, an appropriate density 
plate is selected. In another embodiment, the partitions are generated by a hydrophobic 
substance, e.g., a Teflon mask, grease, or a marking pen (e.g., Snowman, Japan). 

In one embodiment, the substrate is designed with reservoirs isolated by protected 
regions, e.g., a layer of photoresist. For example, for each address, a translation effector 
O can be isolated in one reservoir, and the nucleic acid encoding a test amino acids 

\}\ sequence can be isolated in another reservoir. A mask can be focused or placed on the 

Hi substrate, and a photoresist barrier separating the two reservoirs can be removed by 

W illumination. The translation effector and the nucleic acid reservoirs are mixed. The 

a method can also include moving the substrate in order to facilitate mixing. After 

P sufficient incubation for translation to occur, and for the nascent polypeptides to bind to a 

m binding agent, e.g., an agent attached to the substrate, additional photoresist barriers can 

p be removed with a second mask to facilitate washing a subset or all the addresses of the 

^ substrate, or applying a second compound to each address. 

Planar Substrates. In another embodiment, the addresses are not physically 
partitioned, but diffusion is limited on the planar substrate, e.g., by increasing the 
viscosity of the solution, by providing a matrix with small pore size which excludes large 
macromolecules, and/or by tethering at least one of the aforementioned macromolecules. 
Preferably, the addresses are sufficiently separated that diffusion during the time required 
for translation does not result in excessive displacement of the translated polypeptide to 
an address other than its original address on the array. In yet another embodiment, 
modest or even substantial diffusion to neighboring addresses is permitted. Results, e.g., 
a signal of a label, are processed, e.g., using a computer system, in order to determine the 
position of the center of the signal. Thus, by compensating for radial diffusion, the 
unique address of the translated polypeptide can be accurately determined. 
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Three-dimensional Substrates. A three-dimensional substrate can be generated, 
e.g., by successively applying layers of a gel matrix on a substrate. Each layer contains a 
plurality of addresses. The porosity of the layers can vary, e.g., so that alternating layers 
have reduced porosity. 

In another embodiment, a three-dimensional substrate includes stacked two- 
dimensional substrates, e.g., in a tower format. Each two-dimensional substrate is 
accessible to a dispenser and detector. 

Micromachined chips. Chips are made with glass and plastic materials, using 
rectangular or circular geometry. Wells and fluid channels are machined into the chip, 
and then the surfaces are derivatized. Plasmids solutions would be spotted on the chip 
O and allowed to dry, and then a cover would be applied. Cell-free transcription/translation 

jj j mix would be added via the micromachined channels. The cover prevents evaporation 

y during incubation. A humidity-controlled chamber can be used to prevent evaporation, 

y CD format. A disk geometry (also termed "CD format") is another suitable 

[' substrate for the microarray. Sample addition and reactions are performed while the disk 

O is spinning (see PCT WO 00/40750; WO 97/21090; GB patent application 9809943.5; 

FU "The next small thing" (Dec. 9, 2000) Economist Technology Quarterly p. 8; PCT WO 

K 91/1 6966; Duffy et al. (1 999) Analytical Chemistry; 71 , 20, (1 999), 4669-4678). Thus, 

ft' centrifugal force drives the flow of transcription/translation mix and wash solutions. 

The disc can include sample-loading areas, reagent-loading areas, reaction 
chambers, and detection chambers. Such microfluidic structures are arranged radially on 
the disc with the originating chambers located towards the disc center. Samples from a 
microtiter plate can be loaded using a liquid train and a piezo dispenser. Multiple 
samples can be separated in the liquid train by air gaps or an inert solution. The piezo 
dispenser then dispenses each sample onto appropriate application areas on the CD 
surface, e.g., a rotating CD surface. The volume dispensed can vary, e.g., less than about 
10 pL, 50 pL, 100 pL, 500 pL, 1 nL, 5 nL, or 50 nL. After entry on the CD, the 
centripetal force conveys the dispensed nucleic acid sample into appropriate reaction 
chambers. Flow between chambers can be guided by barriers, transport channels, and/or 
surface interactions (e.g., between the walls and the solution). The depth of channels and 
chambers can be adjusted to control volume and flow rate in each area. 
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A master CD can be made by deep reactive ion etching (DUE) on a 6 ,nch srhco 
waf e I T h ,n 1 a St e r dUccanbep 1 a,edand„sedasa m ode lt „ m ann f ac t „ r eadd It ,ona 1 CDs 

by injection molding (e.g., Amic AB, Uppsala, Sweden). 

A stroboscopic can be used to synchronize the detector with the rotation of the 
CD in order to track individual detection chambers. 

Transcript!"" Effectors 

RNA-drrected RNA polymerases and DNA-directed RNA polymerases arebotb 

suitable transcription effectors. Ti™Wrase ohage 

DNA-directcd RNA polymerases include bactcnophage T7 polymerase , phag 
T3 phage ran, Salmonel.a phage SP6, or Psendomonas phage gh-., as well as aroheal 

T, nromotl Z e g U.S. Patent No. 4,952,496), which can be appropnately posrtioned 

initiations sites are selected according ,0 the specificity of the po.ymerasc. 

RNA polymerase. 

Translation Effectors 

to one embodiment, fire transcnption/fianslation mix is in a minimal votame, and 
«. volume is optimized for each application. The volume of translation effector a. each 
!I canbelLthanaboutloMO-MoMoMolonO-'L. During drspensmg 
^cubation.theartaycanbemarnrained.nanenvironment.opreventevapomtio, 

eg., by covering the wells or by maintaining a humid atmosphere. 

\ another embtidiment, .he entire substiare can be coated or unmersefi m fire 
^lafioneffector.Onepossib.ctrans.ation effec.or ,s a translation e^ep-d 
& „m ceUs. The translation extmc. can be prepared e.g., from a vanety of cells, e.g., 
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vira , RNA po^ase, e.,, T7 RNA poiymeraae. In a preferred — * 

washing. The translation extract can be supplemented, e.g., witn 
iUs^-ener.re^^, *o„ee — - 

nation extract a,so inch* an amber, ochre, or opa, ^"^T^T 
cmb emodified t ocon t amanunna,o ra , amino acid. m ^ 
ttm s,at,on extrac, further mcbades a chapcrone, e.,, an a g en, wh,ch urfo,ds or 
, , <e a a recombinant porified chaperones, e.g., heat shock factors, 
P olypept,des,(e.g.,arecomb " embodiment, the translation 

GroEL/ES and related chaperones, and so forth, lnanome 
!lc t inc, U desaddi tl ve S (e,.,S.ycero,,po. y mer S ,e,c,toa,.er,hev,scos, ty of,be 

extract. 
Affinity Tags 

.edasanaffim.ytag.Theothermemberofthespeciflcbind.ngpatrtsanachedtome 

substrate, either directly or indirectly. 

Onectassofspecificb.nd.ngpairisapept.deepUopeandmemonocona 

■„ , „if,c for it Any epitope to tvhich a specific antibody » or can be made 
;i 5 t- 5.9 for genera, methods of providing an epitope tag. Exempt eprtope tags 

ClmEvan „ a, (,9S5) Mo, « ** 5:36.0-36,6), VSV-G, FLAG, and 6- 
h,s«idine (see, e.g„ German Patent No. DE ,9507 ,66). 

An antibody can be co„p,ed to a substrate of an array, e.g., mduec ,y usmg 
Siccus aureus protein A, or streptococca, protein G. The anhbody can be 
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are 



succinimidyl ester. The test polypeptides with epitopes such as Flag, HA, or myc 
bound to antibody-coated plates. 

Another class of specific binding pair is a small organic molecule, and a 
polypeptide sequence that specifically binds it. See, for example, the specific binding 
pairs listed in Table 1. 
Table 1 



Protein 



Ligand 



glutathione-S-transferase, 


glutathione 


chitin binding protein 


chitin 


Cellulase (CBD) 


cellulose 


maltose binding protein 


amylose, or maltose 


dihydrofolate reductases 


methotrexate 


FKBP 


FK506 



Additional art-known methods of tethering proteins, e.g., the use of specific 
binding pairs are suitable for the affinity or chemical capture of polypeptides on the 
array Appropriate substrates include commercially available streptavidin and avidin- 
coated plates, for example, 96-well Pierce Reacti-Bind Metal Chelate Plates or Reacti- 
Bind Glutathione Coated Plates (Pierce, Rockford, IL). Histidine- or GST-tagged test 
polypeptides are immobilized on either 96-well Pierce Reacti-Bind Metal Chelate Plates 
or Reacti-Bind Glutathione Coated Plates, respectively, and unbound proteins are 

optionally washed away. 

In one embodiment, the polypeptide is an enzyme, e.g., an inactive enzyme, and 
ligand is its substrate. Optionally, the enzyme is modified so as to form a covalent bond 
with its substrate. In another embodiment, the polypeptide is an enzyme, and the ligand 

is an enzyme inhibitor. 

Yet another class of specific binding pair is a metal, and a polypeptide sequence 
which can chelate the metal. An exemplary pair is Ni 2+ and the hexa-histidine sequence 
(see U.S. Patent No. 4,877,830; 5,047,513; 5,284,933; and 5,130,663.). 

In still another embodiment, the affinity tag is a dimerization sequence, e.g., a 
homodimerization or heterodimerization sequence., preferably a heterodimerization 
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nmlP the affinity tag is a coiled-coil sequence, e.g., the 
^ Tn rme illustrative example, me aiinmy ia 5 
sequence. In one iliusirau f t 

rt?^ tVip hindine agent coupled to the array k> ^ v 

-rr— m-— ... 

• j P p wit h a side chain having functional properties 
nrovided by an unnatural amino acid, e.g., with a siae cn 

proviaeaoy The binding agent attached to the 

different from a naturally occurring ammo acid. The binding 

.11 a\ ^ Aitw hind or react with the atimity xag. 

«-—*•-' 

polypeptide. 



pi of Nucleic AcidSe fluences on Arra ys 

i tt c Patent No. 6,1 12,605 describes a device for dispensing 

a^ay. A un„«e capture probe each a ^ ^ ^ § 

ret— 

polycationic surface on the substrate. 
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The capture probe can itself be synthesized in situ, e.g., by a light-directed 
method (see, e.g., U.S. Patent No. 5,445,934), or by being spotted or disposed at the 
addresses. The capture probe can hybridize to the nucleic acid encoding the test 
polypeptide. In a preferred embodiment, the capture probe anneals to the T7 promoter 
region of a single stranded nucleic acid encoding the test amino acid sequence. In 
another embodiment, the capture probe is ligated to the encoding nucleic acid sequence. 
In yet another embodiment, the capture probe is a padlock probe. In still another 
embodiment, the capture probe hybridizes to a nucleic acid encoding a test amino acid 
sequence, e.g., a unique region of the nucleic acid, or to a nucleic acid sequence tag 
provided on the nucleic acid for the purposes of identification. 

e 
o 

jjj Disposed Insoluble Substrates 

fj One or more insoluble substrates having a binding agent attached can be disposed 

N at each address of the array. The insoluble substrates can further include a unique 

O identifier, such as a chemical, nucleic acid, or electronic tag. Chemical tags, e.g., such as 

jj those used for recursive identification in "split and pool" combinatorial syntheses. Kerr 

ft etal. (1993)/ Am. Chem. Sac, 115:2529-2531) Nikolaiev etal. ((1993) Peptide Res. 6, 

S 161-170) and Ohlmeyer et a/.((1993) Proa. Natl. Acad. Sci. USA 90:10922-10926) 

describe methods for coding and decoding such tags. A nucleic acid tag can be a short 
oligonucleotide sequence that is unique for a given address. The nucleic acid tag can be 
coupled to the particle. In another embodiment, the encoding nucleic acid provides a 
unique identifier. The encoding nucleic acid can be coupled or attached to the particle. 
Electronic tags include transponders as mentioned below. The insoluble substrate can be 
a particle (e.g., a nanoparticle, or a transponder), or a bead. 

Beads. The disposed particle can be a bead, e.g., constructed from latex, 
polystyrene, agarose, a dextran (sepharose, sephacryl), and so forth. 

Transponders. U.S. Patent No. 5,736,332 describes methods of using small 
particles containing a transponder on which a handle or binding agent can be affixed. 
The identity of the particle is discerned by a read-write scanner device which can encode 
and decode data, e.g., an electronic identifier, on the particle (see also Nicolaou et al. 
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(1995) A«e*. Ckem. 7»t fit 34:2289-2291;. Test polypeptides are bound to the 
transponder by attaching to the handle or binding agent. 

Dis posed Nu «-i"<- acid Sequences 

Any appropriate nueleic aeid for translation can be disposed at an address of the 
array The nucleic acid can be an RNA, single stranded DNA, a double stranded DNA, 
or combinations thereof. For example, a single-stranded DNA can include a hairpin loop 
at its 5' end which anneals to the T7 promoter sequence to form a duplex in that region. 
The nucleic acid can be an amplification products, e.g., from PCR (U.S. Patent No. 
4 683,196 and 4,683,202); rolling circle amplification ("RCA," U.S. Patent No. 
s',714,320), isothermal RNA amplification or NASBA (U.S. Patent Nos. 5,130,238; 
5,409,818; and 5,554,517), and strand displacement amplification (U.S. Patent No. 
5,455,166). 

In one embodiment, the sequence of the encoding nucleic acid is known prior to 
being disposed at an address. In another embodiment, the sequence of the encoding 
nucleic acid is unknown prior to disposal at an address. For example, the nucleic acid 
can be randomly obtained from a library. The nucleic acid can be sequenced after the 
address on which it is placed has been identified as encoding a polypeptide of interest. 

Amplification in situ 

A nucleic acid disposed on the array can be amplified directly on the array, by a 
variety of methods, e.g., PCR (U.S. Patent No. 4,683,196 and 4,683,202); rolling circle 
amplification ("RCA," U.S. Patent No. 5,714,320), isothermal RNA amplification or 
NASBA , and strand displacement amplification (U.S. Patent No. 5,455,166). 

Isothermal RNA amplification or "NASBA" is well described in the art (see, e.g., 
U S Patent Nos. 5,130,238; 5,409,818; and 5,554,517; Romano et al. (1997) Immunol 
Invest. 26:15-28; in technical literature for "RnampliFire™" Qiagen, CA). Isothermal 
RNA amplification is particularly suitable as reactions are homogenous, can be 
performed at ambient temperatures, and produce RNA templates suitable for translation. 
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Vectors for Expression 

Coding regions of interest can be taken from a source plasmid, e.g., containing a 
full length gene and convenient restriction sites, or sites for homologous or site-specific 
recombination, and transferred to an expression vector. The expression vector mcludes a 
promoter and an operably linked coding region, e.g., encoding an affinity tag, such as one 
described herein. The tag can be N or C terminal. The vector can carry a cap- 
independent translation enhancer (CITE, or IRES, internal ribosome entry site) for 
increased in vitro translation of RNA prepared from cloned DNA sequences. The fusion 
proteins will be generated with commercially available in vitro transcription/translation 
M kits such as the Promega TNT Coupled Reticulocyte Lysate Systems or TNT Coupled 

O Wheat Germ Extract Systems. Cell-free extracts containing translation component 

5 derived from microorganisms, such as a yeast, or a bacteria, can also be used, 

f In addition, the vector can include a number of regulatory sequences such as a 

W transcription promoter; a transcription regulatory sequence; a untranslated leader 

? sequence; a sequence encoding a protease site; a recombination site; a 3 • untranslated 

P. sequence; a transcriptional terminator; and an internal ribosome entry site. 

fU The vector or encoding nucleic acid can also include a sequence encoding an 

S intcin. Methods of using inteins for the regulated removal of an intervening sequence are 

™ described, e.g., in U.S. Patent Nos. 5,496,714 and 5,834,247. Inteins can be used to 

eyclize, ligate, and/or polymerize polypeptides, e.g., as described in Evans et al. (1999) J 
Biol Chem 274:3923 and Evans et al. (1999) J Biol Chem 274:18359. 

v ypm plarv Use ful Sequences 

Naturally occurring sequences. Useful encoding nucleic acid sequence for 
creating arrays include naturally occurring sequences. Such nucleic acids can be stored 
in a repository, see below. Nucleic acid sequences can be procured from cells of species 
from the kingdoms of animals, bacteria, archebacteria, plants, and fungi. Non-hmiting 
examples of eukaryotic species include: mammals such as human, mouse {Mus 
musculus), and rat; insects such as Drosophila melanogaster; nematodes such as 
Caernorhabditis elegans; other vertebrates such as Brachydanio rerio; parasites such as 
Plasmodium falciparum, Leishmania major, fungi such as yeasts, Histoplasma, 
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Cryptococcus, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Pichia pastoris 
and the like); and plants such as Arabidoposis thaliana, rice, maize, wheat, tobacco, 
tomato, potato, and flax. Non-limiting examples of bacterial species include E. coli, B. 
subtilis, Mycobacterium tuberculosis, Pseudomonas aeriginosa, Vibrio cholerae, 
Thermatoga maritime, Mycoplasma pneumoniae, Mycoplasma genitalium, Helicobacter 
pylori, Neisseria meningitidis, and Borrelia burgdorferi. In additional, amino acid 
sequence encoded by viral genomes can be used, e.g., a sequence from rotavirus, hepatitis 
A virus, hepatitis B virus, hepatitis C virus, herpes virus, papilloma virus, or a retrovirus 
(e.g., HIV-1, fflV-2, HTLV, SIV, and STLV). 

In a preferred embodiment, a cDNA library is prepared from a desired tissue of a 
desired species in a vector described herein. Colonies from the library are picked, e.g., 
using a robotic colony picker. DNA is prepared from each colony and used to program a 
NAPPA array. 

Artificial sequences. The encoding nucleic acid sequence can encode artificial 
amino acid sequences. Artificial sequences can be randomized amino acid sequences, 
patterned amino acid sequence, computer-designed amino acid sequences, and 
combinations of the above with each other or with naturally occurring sequences. Cho et 
al. (2000) JMol Biol 297:309-19 describes methods for preparing libraries of randomized 
and patterned amino acid sequences. Similar techniques using randomized 
oligonucleotides can be used to construct libraries of random sequences. Individual 
sequences in the library (or pools thereof) can be used to program a NAPPA array. 

Dahiyat and Mayo (1997) Science 278:82-7 describe an artificial sequence 
designed by a computer system using the dead-end elimination theorem. Similar systems 
can be used to design amino acid sequences, e.g., based on a desired structure, such that 
they fold stably. In addition, computer systems can be used to modify naturally occurring 
sequences in order 

Mutagenesis. The array can be used to display the products of a mutagenesis or 
selection. Examples of mutagenesis procedures include cassette mutagenesis (see e.g., 
Reidhaar-Olson and Sauer (1988) Science 241:53-7), PCR mutagenesis (e.g., using 
manganese to decrease polymerase fidelity), in vivo mutagenesis (e.g., by transfer of the 
nucleic acid in a repair deficient host cell), and DNA shuffling (see U.S. Patent No. 
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5 605 793; 5,830,721; and 6,132,970). Examples of selection procedures inchrde 

such .ha. each possible substitution is prcsen. a. a umque P os,.,on onthe array. Fo 
coaptations. Alternatively, the range of variation can be restricted to reasonable or 

♦ ^ .nivnentide orthologs from different species, polypeptide 
Jbolic pathway); and the entire polypeptide comp.ement of an orgamsm. 

T^ pnsitorie? ofNucleic Acids 

The arrays described herein can be produced from nudeic acid sequences ma 
Z For examp.e cotnmercia, and academic institutions are prov,dmg large- 

4 u ai For examole the collection can contain 500, 1 ,000, iu,uw, i. , 
"olTl'mn.lengmse.nence, Oneexamp,eof ancha^is.e 
PLEX , u Leng.h Expression) Repository (Harvard mstim.e of Proteomtcs, Harvard 

Lnbacteriatransform - «*■ ^ — ' 

coding region. For example, each Cone can be access* «o a robo and ca be hached 
* Led e.g„ by a ,ocator (e.g., a bar code, a transponder, or Cher 
T tita Thu a desired consnuc. can be obtained from .he reposi.ory .hrongh a 
rrl'lerface^ou.manua, intervention. The computing nnU can also 
anyinfonmationganaeredbyexperimen.tionorbyoti.erda.bases 
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history as well as any re.evant links to other btologtcal databases. 

IdonesintheooUecttoncanbe—edandp^cedtnafonna, 

I ^recombinations, c.onin 8 system tbat enables aotomated direcuona, and 
compatible with a recommit Wtional vector, obviating 

ss^^«-- ta ^ fc ^^ <,, r" ,,,,, " 

efficient, rapid, and easily sca.ed for high-throng Prance. 

rrrnfr K;notinnal Cloning 

Methods forrecomb—^^ 
t k .888 732- Walhout et al. (2000) Science 287:116; Liu et al. (1998) Cur, 
Patent No. 5,888,732, wainou , A ; t , the activity of certain enzymes 

n- i RP4V1300-9) Recombinational cloning exploits the activity 
Biol. 8(24).1300 other matchmg 

that cleave DNA at specific sequences and then rejoin 

target vector, ^ PV nression vectors are 

„ ri^ffprpnt versions 01 the expression 

II wirhinrheexpresston vector ataloeattonappropnatetoreeetvethecodtn, 
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nucleic acid sequence harbored in the master clone. Particular attention is given to 
insure that the reading frame is maintained for translation fusions, e.g., to an affinity or 
recognition tag. To shuttle the gene into the target vector, the master clone vector 
containing a nucleic acid sequence of interest and the target vector are mixed with the 
recombinase. 

The mixture is transformed into an appropriate bacterial host strain. The master 
clone vector and the target vector can contain different antibiotic selection markers. 
Moreover, the target vector can contain a gene that is toxic to bacteria that is located 
between the recombination sites such that excision of the toxic gene is required during 
recombination. Thus, the cloning products that are viable in bacteria under the 
appropriate selection are almost exclusively the desired construct. In practice, the 
efficiency of cloning the desired product approaches 100%. 

To construct the repository, a computer system can be used to automatically 
design primers based on sequence information, e.g., in a database. Each gene is 
amplified from an appropriate cDNA library using PCR. The recombination sequences 
are incorporated into the PCR primers so the amplification product can be directly 
recombined into a master vector. As described above, because the master vector carries a 
toxic gene that is lost only after successful recombination, the desired master clone is the 
only viable product of the process. Once in the master vector, the gene can be verified, 
e.g., by sequencing methods, and then shuttled into any of the many available expression 
vectors. 

In a preferred embodiment, each gene is cloned twice, i.e., into two master 
vectors. In one clone, the stop codon is removed to provide for carboxy-terminal fusions. 
In the other clone, the native stop codon is maintained. This is particularly important for 
polypeptides whose function is dependent on the integrity of their carboxy-terminus. 

Genes in the repository are thus suitable prepared for analysis in activity screens 
and functional genomics experiments using the NAPPA array. Because of the ease of 
shuttling multiple genes to any expression vector en masse, these clones can be prepared 
in multiple array formats, such as those described herein, for a variety of functional 
assays. 
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Liu 4 a/. (1998) Curr. Biol. 8:1300 describe a Cre-lox based site-specific 
recombination system for the directional cloning of PCR products. This system uses 
Cre-Lox recombination and a single recombination site. Here again the master clone is 
mixed with a target vector and recombinases. However, instead of swapping fragments, 
the recombination product is a double plasmid connected at the recombination site. This 
then juxtaposes one end of the gene (whichever end was near the recombination site) with 
the desired signals in the expression plasmid. 

The clone can include a vector sequence and a full-length coding region of 
interest. The coding region can be flanked by marker sequences for site-specific 
recombinational cloning, e.g., Cre-Lox sites, or lambda int sites (see, e.g., Uetz * al. 
(2000) Nature 403:623-7). Also, the coding region can be flanked by marker sequences 
for homologous recombination (see, e.g., Martzen et al. (1999) Science 286:1153-5). For 
homologous recombination almost any sequence can be used that is present in the vector 
and appended to the coding region. For example, the sequence can encode an epitope or 
protease cleavage site. After recombination, the full-length coding region can be 
efficiently shuttled into a recipient plasmid of choice. For example the recipient plasmid 
can have nucleic acid sequences encoding any one or more of the following optional 
features: an affinity tag, a protease site, and an enzyme or reporter polypeptide. The 
recipient plasmid can also have a promoter for RNA polymerase, e.g., the T7 RNA 
polymerase promoter and/or regulatory sites; a transcriptional terminator; a translation^ 
enhancer e.g., a Shine-Dalgarno site, or a Kozak consensus sequence. 

Pool Method 

A large number of proteins can be screened in one or more passes by the 
following pooling method. The method uses a first array wherein each address includes a 
pool of encoding nucleic acid sequences. Addresses identified in a screen with the first 
array are optionally further analyzed by splitting the pool into different addresses in at 
least a second array. 

Each address of the first array includes a plurality of nucleic acid sequences, each 
encoding a unique test amino acid sequence and an affinity tag. Thus, each address 
encodes a pool of test polypeptides. The pools can be random collections, e.g., fractions 
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of cDNA library, or specific collections of sequence, e.g., each address can contain a 
family of related or homologous sequences, a set of sequence expressed under similar 
conditions, or a set of sequences from a particular species (e.g., of pathogens). 
Preferably, a test polypeptide is encoded at only one address of the array. 

An interaction detected at a given address by the presence of the second amino 
acid sequence at an addresses can be further analyzed (e.g., deconvolved) by providing a 
second array, similar to the first, however, each address containing a nucleic acid 
sequence encoding a single test polypeptide, the test polypeptide being one of the 
plurality of test polypeptides at the given address of the first array. 

However, arrays with specific collections may not require using a second array. 
0 For example, in diagnostic applications, it may suffice to merely identify a collection of 

CD 

j|| sequences. 

W In another embodiment, an array is used to deconvolve a pool of library sequence 

t identified in a screen that did not rely on arrays to screen initial pools. For example, 

Kirschner and colleagues describe an in vitro screening method to identify protein 
interaction partners using radioactively labeled protein pools derived from small pool 
fl cDNA libraries (Lustig et al. (1997) Methods Enzymol. 283:83-99.). Individual 

[ij members of such pools can be identified using an array in which unique nucleic acid 

A components of the pool are disposed at unique addresses on the NAPPA platform. An 

array of sufficient density obviates the need to iteratively subdivide the pool. 

In yet another embodiment, the substrate includes a plurality of nucleic acids at 
each address. The plurality of nucleic acid sequence encodes a different plurality of test 
polypeptides from the plurality at another address. Each plurality is such that it encodes 
the components of a protein complex, e.g., a heterodimer, or larger multimer. Exemplary 
protein complexes include multi-component enzymes, cytoskeletal components, 
transcription complexes, and signalling complexes. The array can have a different 
protein complex present at each address, or variation in protein complex composition at 
each address (e.g., for complexes with optional components, the presence or absence of 
such components can be varied among the addresses). One or more members of the 
plurality of test polypeptides can have an affinity tag, preferably just one member has an 
affinity tag. 
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b still another embodiment, me ptarafity of eneoding nueleie acids a. each 
address are se.ec.ed by a computer program which identifies gronps of encoding nuc,e,c 
ac.ds for each address snch ma. if an address is identified, me relevant po.ypep.rde 
sequence can be determined with Utile or no ambiguity. For example, for MALDI-TOF 
detection methods, eneoding nnc.eie ae,d are grouped such that masses of peptide 

JL*. or non-overlapping. Thus, dereetion of a peptide mass from time-of-fitgb. 
data at an address would unambiguously identify the relevant polypeptide. 

Unnatural Amino Acids 

PCT WO90/05785 describes the use of in vitro translation extracts to include 
unnatural amino acids a, defined positions within a po.ypeptide. In mis method a stop 
codon eg.anambereodon.isinseriedinthenudeieaeidsequeneeeneodmgrhe 
peptide at the desired posttion. An amber-suppressing .RNA with an unnatural 
ami acid is prepared artifieially and included in the translation extract. Thts ^od 

acids, e. g „ an amino acids with chemical properties not available from the standaxd 

acid with a xeto gnaup. Keto groups are particularly useful chemical handles as they are 
stabl e in an unprotected form in cefi extracts, and ab,e ,0 react wat» e an 
famines to fonn hydrazones and oximes (Cornish e, a,. (.996) JACS 18.8150 . 
Thus, tire amber codon ean be used as an affinity tag lo attaeh translated protons to a 
hydrazide attached to the substrate. 

n^rai Applications 

The polypeptide a.ays described herein can be used in a number of applications. 
Kon,imiti„g examples are described as fofiows. The regulation of ce.lu.ar p_ 
incl uding con.ro. of gene expression, can be investigated by examinmg protem-protem, 
Ln- eptide, and prorem-nuc.e.c acid interactions; antibodtes can be screened 
an atray o potential antigens for profiling antibody specificity or to search for common 
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epitopes; proteins can be assayed for discrete biochemical activities; and the disruption of 
protein-ligand interactions by synthetic molecules or the direct detection of proton- 
synthetic molecule interactions can aid drug discovery. Given the versatility of 
programming the array, elements at each address are easily customized as appropriate for 
the desired application. 

Protein Activity Detection 

A nucleic acid programmable array can be used to detect a specific protein 
activity Each address of the array is contacted with the reagents necessary for an activity 
assay Then an address having the activity is detected to thereby identify a protein having 
a desired activity. An activity can be detected by assaying for a product produced by a 
protein activity or by assaying for a substrate consumed by a protein activity. 

Protein Interaction Detection 

A nucleic acid programmable array can be used to detect protein-protein 
interactions. Moreover, the array can be used to generate a complete matrix of protein- 
protein interactions such as for a protein-interaction map (see, e.g., Walhout et al, 
Science 287: 116-122, 2000; Uetz et al, Nature 403, 623-631, 2000); and Schwikowski 
(2000) Nature Biotech. 18:1257). The matrix can be generate for the complete 
complement of a genome, proteins known or suspected to be co-regulated, proteins 
known or suspected to be in a regulatory network, and so forth. 

The detection of protein-protein interactions, e.g., between a first and a second 
protein, entails providing at an address a nucleic acid encoding the first polypeptide and 

an affinity tag, and a nucleic acid encoding a second polypeptide and a recognition tag, 

e g a recognition tag described below. 

In one embodiment, after translation of both nucleic acids, the array is washed to 

remove unbound proteins and the translation effector. Detection of an address at which 

the second polypeptide remains bound is indicative of a protein-protein interaction 

between the first and second polypeptide of that address. 



98 



Attorney 



No: 00246-260001 /HI 803 



In another embodiment, a third or competing polypeptide can be present during 
the binding step, e.g., a third encoding nucleic acid sequence lacking a tag can be 

included at the address. 

In yet another embodiment, the stringency or conditions of the binding or washing 
steps are varied as appropriate to identify interactions at any range of affinity and/or 
specificity. 

Recognition Tags 

A variety of recognition tags can be used. For example, an epitope to which an 
antibody is available can be used as a recognition tag. The tag can be place N or C- 
terminal to the sequence of interest. The tag is recognized, e.g., directly, or indirectly 

(e g , by binding of an antibody). 

Green fluorescent protein. Coding regions of interest are taken from the FLEX 
repository and transferred into fusion vectors encoding either an N- or C-terminal green 
fluorescent protein (GFP) tag. These vectors have been made (Figure 2), and the 
backbones are similar to those encoding the poly-histidine and GST tags. The GFP- 
tagged proteins, the query, are co-transcribed/translated with the immobilized target 
protein, Target-query complexes are allowed to form, and unbound protein is washed 
away Target-query complexes are then detected by fluorescence spectroscopy (Spectra 
Max Gemini, Molecular Devices). The environment of a fluorophore has a strong effect 
on the quantum yield of fluorescence (i.e., the ratio of emitted to absorbed photons) 
through collisional processes and resonance energy transfer (a radiative process), so the 
concentration of target-query complexes that gives an acceptable signal-to-noise ratio 
will have to be determined experimentally. 

Fluorescence polarization can be used to detect the recognition tag while 
circumventing the need for immobilization and wash steps to detect protein complexes. 
When GFP-tagged query is bound to target, the polarization of the fluorescence of GFP 
increases due to the reduced mobility of the complex, and this increase in polarization 
can be measured. Conventional fluorescence spectroscopy and fluorescence polanzation 
methods can be used to detect protein-protein interactions. See, e.g., Garcia-Parajo et al. 
(2000) Proc. Natl. Acad. Sci. USA 97, 7237-7242. 
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F.vemplarv P rotein Complexes 

The following exemplary protein complexes can be used to verify or optimize 
methods or to provide convenient positive and negative controls, e.g., using known 
interactors of various affinities. Such interactors can include: the signaling proteins 
cdk4-pl6, cdk2- P 21, E2F4-pl30, and the transcription factors Fos-Jun; components of 
the DRIP complex (vitamin D Receptor Interacting Proteins; Rachez (1999) Nature 
398:824 and Rachez (2000) Mol Cell Biol. 20:2718). 

Protein-DNA Screens 

Transcription factors that bind to specific DNA sequences may be identified. 
Here DNA is the query molecule and can be fluorescently labeled. Alternatively, the 
DNA can be biotinylated and detected by HRP coupled to avidin. 



Protein-Sm "" Molecule Screens 

An array described herein can be used to identify a polypeptide that binds a small 
molecule. The small molecule can be labeled, e.g., with a fluorescent probe, and 
contacted to a plurality of addresses on the array (e.g., prior, during, or after translation of 
the programming nucleic acids). The array can be washed after maintaining the array 
such that the small molecule can bind to a polypeptide with an affinity tag. The signal at 
each address of the array can be detected to identify one or more addresses having a 
polypeptide that binds the small molecule. 

Other signal detection methods include surface plasmon resonance (SPR) and 
fluorescence polarization (FP). Methods for using FP are described, for example, in U.S. 
Patent No. 5,800,989. Methods for using SPR are described, for example, in U.S. Patent 
No 5 641,640; and Raether (1988) Surface Plasmons Springer Verlag. 

' m another embodiment, the invention features a method of identifying a small 
molecule that disrupts a protein-protein interaction. The array is programmed with a first 
and a second nucleic acid which respectively encode a first and second polypeptide 
which interact. The first polypeptide includes an affinity tag and second polypeptide 
includes a recognition tag. A unique small molecule is contacted to an address of the 
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array (e.g., prior, during, or after translation of the programming nucleic acids). The 
array can be washed after maintaining the array such that the small molecule, the first and 
the second polypeptide can interact. The signal at each address of the array is detected to 
identify one or more addresses having a small molecule that disrupts the protein-protein 
interaction. 

Pre-Clinical Evaluation o f T . pad Compounds 

An application that exploits the ability to screen for small molecule interactions 
with proteins could be the pre-clinical evaluation of a lead drug candidate. Drug 
toxicities often result not from the intended activity on the target protein, but some 
activity on an unrelated binding protein(s). Even when these adventitious binding 
proteins do not cause toxicity, they can adversely affect the drug's pharmacokinetics. A 
comprehensive protein array would make the pre-clinical identification of these 
adventitious binders rapid and straightforward. 



Medicinal Chemistrv 

The small molecule screen could become a rapid and powerful platform by which 
medicinal chemistry and SAR could be performed. Chemical modifications of small 
molecules could be tested against the array to see if changes improve specificity. 
Compounds could be exposed first to hepatic lysates or other metabolic extracts that 
mimic metabolism in order to create potentially toxic metabolites that can also be 
screened for secondary targets. Recursion of this process could lead to improved 
specificity and tighter binding molecules. 

Mass Spectroscopy 

The polypeptide array can be used in conjunction with mass spectroscopy, e.g., to 
detect a modified region of the protein. An array is prepared as described herein with 
due consideration for the flatness, conductivity, registration and alignment, and spot 
density appropriate for mass spectroscopy. 

In one embodiment, the method identifies a polypeptide substrate for a modifying 
enzyme. Each address is provided with a nucleic acid encoding a unique test 
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polypeptide. Eaeh address of the array is con.ae.ed wi.h ,he modifyng enzyme, e.g., a 
Le,ame ra y.ase,aprotease and so forth The enzyme can be synthesized a. the 
address, e.g., by incU.de a nnc,e,c acid encoding it a, the address with the nuc.e.c a„d 
encoding the test sequence. After snfftcien. incubation to assay the motivation s ep, 
each address is proteoiyzed, e.g., trypsinized. The rcsnhing peptide mixtures canbe 
subject to MALDI-TOF mass spectroscopy analysis. The combination ofpept.de 
fra^enls observed a, each address can be compared with the fragments expected or an 
Jled protein based on the sequence of nucleic acid deposited at the same addres. 
The use of computer programs (e.g., PAWS) to predict trypsin fragments ,s routme m the 
«, Thu S ,eachaddress„ftbearraycanbcana.yzedbyMALDl. Addresses contammg 

modified peptide fragment rotative to a predicted pattern or rotative to a control array can 
beiden.ifiedasconuainingpoten.ialsubstratesofu.emodifyingenzyme. 

The amount of modifying enzyme contacted to an address can be vaned, e.g., 
from array to array, or from address to address. 

For example, finis approach can be used to identify phosphorylation by companng 
the masses of peptide fragments from an address that having a kinase, and an address 
la eking the kinase. Pandey and Mann (2000) Nature 405:837 describe methods 
mass spectroscopy to identify protein modification sites. 

to another embodiment, the modifying enzyme is varied at each address, and the 
,es, polypeptide, the peptide with the affinity tag for attachment to the substrate , 
the same a. each address. Both the modifying enzyme and the test polypeptide canbe 
synthesized on the array by translation of encoding nucleic acid sequence, Mass 
spectroscopy is used to identify an address having a modifying enzyme wtth spec.fic.ty 
for the test polypeptide as enzyme-substrate. „ lvnm «ae 
Mass spectroscopy can a,so be used to detect the binding of a second polypeptide 
,o the targe, protein. A firs, nucleic acid encodmg a unique targe. amino acid sequences 
and an affinity tag is disposed a. each address in the array. A poo. of nuc.e.c acrds 
encoding candidate amino acid sequence is also disposed a, each address of the array. 
Each address of fire array is translated and washed to remove unbound 
proteins that remain bound a. each address, presumab.y by direct interaction wtth the 
target proteins, can then be detected and identified by mass spectroscopy. 
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Assa y to Idep "«Y Fnlded Proteins 

The NAPPA array can be used to identify appropriately folded protein species, or 
proteins with appropriate stability. For example, arrays ean be provided with a nueleie 
acid sequence encoding a random amino acid sequence, a designed amino acid sequence, 
or a mutant amino acid sequence at each address. Such an array can be used to analyze 
the results of a computer-designed polypeptide, the results of a DNA-shufflmg, or 
combinatorial mutagenesis experiment. The array is contacted with transcription and 
translation effectors, and subsequently washed provide purified polypeptides at each 
address. 

Subsequently, eaeh address of the array is monitored for a property of the folded 
speeies. The property ean be partieular to the desired polypeptide speeies. For example, 
toe property ean be toe ability to bind a substrate. Alternatively, the property can be 
more general, sueh as toe fluoreseenee emission profile of toe polypeptide when exerted 
a. 280 nm. Fluoreseenee, particularly of tryptophan residues is an indicator of the extent 
of burial of aromatic groups. Upon denaturation, toe center of mass of toe fluorescence 
of exposed tryptophans is shifted. In additional, at an appropnate detection wavelength, 
toe intensity of fluorescence varies with the extent of folding. The array, or selected 
addresses of the anay, can be incrementally exposed to increasing denaturing condmons, 
eg by thermal or chemical denaturation. Thermal denaturation is useful as « does not 
require altering solutions contacting toe array. Thus, if toe array contains partutons, 
subsequent to toe washing step, binding of the affinity tag to its handle on the substrate is 
no, required. Addresses showing cooperative folding transitions or increased stabrhty are 

thus readily identified 

Additional properties for monitoring folding include fluorescent detection of ANS 

binding, and circular dichroism, 

-Irlrrtinn T""r ™«P la Y Technologies 

In another aspect, the NAPPA platform is used to screen - in a massively parallel 
format - a first collection of polypeptides for binding to members of a second collection 
of polypeptides. 
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The first collection of polypeptides is prepared in a display format, e.g., on a 
bacteriophage, a cell, or as an nucleic acid-polypeptide fusion (Smith and Petrenko 
(1997) Chem. Rev. 97:391; Smith (1985) Science 228:1315; Roberts and Szostak (1997) 
Proc. Natl. Acad. Sci. USA 94:12297). For a review of display technologies see Li 
(2000) Nat. Biotech. 18:1251. The first collection can be obtained from any source, e.g., 
a source described herein. In one illustrative example, the first collection is an artificial 
antibody library. 

The second collection of polypeptides is distributed on an array described herein 
For example, a nucleic acid encoding each polypeptide of the second collection can be 
disposed at a unique address of the array. The array is prepared as described herein. 
Q Before, during, or after translation of the encoding nucleic acids, the first 

If! collection in display format, termed display polypeptides, is applied to the array. After 

J translation of the encoding nucleic acid, the array is washed to remove unbound display 

W polypeptides. Then, presence of a display polypeptide at at least one address is detected, 

^ e.g., by amplification of the nucleic acid portion of nucleic acid-polypeptide fusion; by 

£ propagation of a cell or bacteriophage displaying the display polypeptide; and so forth. 

ru 

rU Extracellular Proteins 

n 

ft' In one embodiment, an extracellular polypeptide or extracellular domain can be 

displayed on a NAPPA array, e.g., by contacting the array with conditions similar to the 
extracellular, endoplasmic reticulum, or Golgi milieu. For example, the conditions can 
be oxidizing or can have a redox potential that is optimized for extracellular protein 
production. The array can be additionally contacted with modifying enzymes found in 
the secretory pathway, e.g., glycosylases, proteases, and the like. 

In another embodiment, the translation effector is applied in conjuction with 
vesicles, e.g., endoplasmic reticular structures. The vesicles can include an affinity tag to 
anchor the vesicle to the array. In such an embodiment, the encoding nucleic acid need 

not contain an affinity tag. 

An array of extracellular proteins or extracellular protein domains can be used to 
identify interactions with other extracellular proteins; or alteration of living cells (e.g., the 
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adhesive properties, motility, or the secretory repertoire of a cell contacting the the 
extracellular protein). 

Transmembrane Proteins 

Transmembrane proteins can be displayed on a NAPPA array by separately 
producing the nucleic acids encoding the ecto- or extracellular domains, and the 
cytoplasmic domains. The extracellular domains and the cytoplasmic domains can be 
encoded at separate addresses or the same address. Alternatively, only one of the two 
types of domains is encoded on the array. 

In another embodiment, the transmembrane domain can be excised. Ottemann et 
a/.(1997) Proc. Natl. Acad. Sci. USA 94:1 1201-4 describe a method for excising a 
transmembrane domain to generate a soluble functional protein. 

In yet another embodiment, in vitro translation on the array further includes 
providing vesicles derived from endoplasmic reticulum. 

Contactin g Array with Cells 

In another embodiment, at least one address of the array, e.g., after translation of 
encoding amino acids, is contacted with a living cell. After contacting the array, the cell 
or a cell parameter is monitored. For example, polypeptide growth factors can be arrayed 
at different addresses, and cells assayed after contact to each address. The cells can be 
assayed for a change in cell division, apoptosis, gene expression (e.g., by gene expression 
profiling), morphology changes, differentiation, proteomics analysis (e.g., by 2-D gel 
electrophoresis and mass spectroscopy), and specific enzymatic activities. 

In one embodiment, a test polypeptide of the array can be detached from the 
substrate of the array, e.g., by proteolytic cleavage at a specific protease site located 
between the test sequence and the tag. 

In another embodiment, the test polypeptide does not have an affinity tag, but is 
maintained at an address by physical separation from other addresses of the plurality. 
The translation effector is optionally not washed from the address. Cells are assayed 
after being maintained at the address as described above. 
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Cell-Free Assay Platforms 

High-throughput, genome-wide screens for protein-protein, protein-nucleic acid, 
protein-lipid, protein-carbohydrate, and protein-small molecule interactions can be 
performed on an array described herein. Each address of the array can include a 
polypeptide encoded by a nucleic acid clone from a repository of full-length genes, e.g., 
genes stored in a vector that facilitates rapid shuttling by recombinational cloning. 



Kits 

Kits are convenient collections of components, e.g., reagents that can be supplied 
to a user in order to efficiently enable the user to practice a method described herein. 

Universal Primer Kit. A universal primer kit provides a simple means for 
amplifying a collection of encoding nucleic acid sequences in a format suitable for 
disposal on an array. The kit includes a 5' universal primer and a 3'universal primer. 
The kit can further include a substrate, e.g., with an appropriate binding agent attached 
thereto. 

The 5' primer can include the T7 promoter and a 5' annealing sequence, whereas 
the 3' primer can include a 3' annealing sequence and sequence encoding an affinity tag. 
Nucleic acid coding sequences amplified with the 5' annealing sequence and the 3' 
annealing sequence are further amplified with the universal primer set. The products of 
this amplification are amenable for immediate disposal on the array. 

Moreover, asymmetric PCR can be utilized to create an excess of the coding 
strand Single-stranded DNA can be deposited on the array and annealed to a T7 
promoter nucleic acid capture probe in order to provide a duplex recruitment site for T7 
polymerase. 

The kit can further include transcription and/or translation effectors, reagents for 

amplification, and buffers. 

Recombinational Cloning Kit. A recombinational cloning kit provides tools for 
shuttling multiple encoding nucleic acid sequences, preferably en masse, into a vector 
having suitable regulatory sequences, and affinity tag-encoding sequence for the NAPPA 
platform The kit includes a substrate with multiple addresses, each addressing having a 
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binding agent attached «o ,he substrate. The * a,so inCudes a vector having sequences 
L g e„elg eneoding nucleic aeid with affinity tags. Onee a nucletc ac sequence » 
Coned into the vector, the nucleic acid of the vector with the insert is suttable for 

'"^virTwude a recombination site, e.g., a site-specific recombination 
sit e or a homoiogons recombination site. AUemafive.y, the vector can include untque 

L peptides. Those features faciHtate the rapid, and paral.el con— of mn.Up.e 
codlgleic ac.dsforprogrammingthe array. Thus, a complex array havmg many 
unique polypeptide sequences can be easily produced. 

Forlmole, a repository of cloned MHengfh coding sequences 
flanked by recombination sites is constructed. Multiple sequences in the reposito^ are 

techniques (see description of Recombinationa, cloning above, and The Gateways 
Manual, Invitrogen, CA). Robotics and microti.er plates can be used to rap.d.y 
producing the multiple coding nucleic acids for programming the array. 

The kit can further include a second vector having recombination sties, 
appropriate regulatory sequence, and a recognition tag, such as a recognmon tag 
ZL herein. The user can .bus shuttle a nucleic acid encodtng a sequence of .uteres, 
into both a vector with an affinity tag, and a vector with a recognition tag. Thts 
compatibility facilitates the generation of protein-protein interaction mamces. 

. M.^rk Archi - 1 fer EiayjdiM! » NAPf A Arra * 

Referring to Figure 3, a user system 14 and a request server 20 are connected by a 
network 12 e g., an intranet or an internet. For example, the user system and the re^ues 

request server in an applications deparhnent. Alternatively, the user system .4 can he 
Jated within one company, e.g., in a dtagnostics division, and the request server 2 « 
be ,oca«ed in a second company, e.g., a protein n—y provider. The company can 
be connected by a network, e.g., by me Internet, a proprietary network, a dtal-up 



108 





Attorne«g»No: 0 0246-260001/H1803 



oo.mec.io., a wtreless connection, an intermediary, or a enstomized procurement 
network A network within a company can be protected by a firewall 19. 

The reques. server 20 is connected to a database server 22. The database server 
22 can contain one or more tables with records to amino acid sequences of polypepudes 
(e g a relational database). For example, each record can contain one or more fields for 
,he following: the amino acid sequence; the location of a nucleic acid clone encodtng the 
nucleic acid in a repository or clone bank; category field; binding Uganda of the 
polypeptide; co-localizing and/or binding po.ypep.ides; links (e.g., hypertext hnks to 
other resources); and pricing and quality control information. The database can also 
contain one or more tables for classes and/or subsets of amino acid sequence. For 
example, a class can contain en.ries for amino acid sequences expressed in a par.tcu.ar 
tissue, correlated with a condition or disease, originating from a species, havtng 
homology to a protein family, related to a biological (e.g., physiologies, or cellular) 



process, and so forth. ^ 
The request server 20 sends to the user 1 4 one more choices for ammo actd 
sequence to tnclude on a microarray. The choices arc provided in a user-friendly forma, 
e g a hypertext page with forms (e.g., se.eetion boxes). The choices can be hierarchtcat 
e g ' a firs. lis. of choices to determine general user needs, and subsequent chotees e.g., of 
adlss of amino acid sequence, or of individua. amino acid sequences. The choices can 
also include predesigned microaxrays, as well as individually customized destgns. The 
server can also recommend appropriate negative and positive court ammo actd 
sequence to include depending on previous sections. Alternatively, .he system can be 
vote, based, the queries and selections are «ansmi..ed across a ,elecommunica.,o„s 
network, e.g., a telephone, a mobile phone, etc. 

The user indicates selections, e.g., by clicking on a form provided on a web page. 
The request server forwards .he selections, e.g., .he location of nucleic acid encodtng a 
se ,ec.ed amino acid sequence in a clone bank, .o a clone bank robot con.ro.le, The robot 
controller 26 mobilizes a robot to access the clone bank and obtain me desired encodtng 
nucleic acid. Optionally, .he nucleic acid can be shuttled from a repository vector m.o an 
expression vector using recombinational cloning techniques. In another posstb.e 
implementation, the nucletc acid stored in the repository is aiready in an approprtate 
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expression vector for nucleic acid programmable protein microarray production. In still 
another possible implementation, the nucleic acid is amplified with primers which 
contain the requisite flanking sequence for disposal on the microarray. For example, one 
or more primers can include a T7 promoter, and/or an affinity tag. 

Once obtained, the nucleic acid is provided to an array maker. The array 
processing server 24 is also interfaced with the request server 20 and the robot controller 
26. The nucleic acid is deposited onto one or more array substrates, e.g., using a method 
described herein. The array production controller selects one or more addresses at which 
the nucleic acid is deposited, and records the addresses in a table associated with the 
M , array being produced. The array production controller can also vary the amount and 

O method of deposition for any particular sample or address. Such variables and additional 

r*"i 

quality control information is also stored in the table. 

For example, if multiple identical arrays are produced in parallel, one or more 
arrays can be used for a quality control testing. For example, transcription and translation 
effectors can be contacted to the array at the production facility. The presence of selected 

P or control proteins is verified by contacting the array with specific antibodies for such 

r* 

fU proteins, and detecting the binding. 

q Once produced, an array is prepared for shipping, for example, contacted with a 

N preservative solution, dessicated, and/or coated in an emulsion, film, or plastic wrap. 

The request server 20 interfaces with a courier system 34, e.g., to track shipment and 
delivery of the array to the user. The request server also notifies the user of the status of 
the array production and shipment throughout the procurement process, e.g., using 

electronic mail messages. 

The request server interfaces with a business-to-business server to initiate 
appropriate billing and invoicing as well as to process customer service requests. 

Diagnostic Assays 

A variety of polypeptide microarrays can be provided for diagnostic purposes. 
The array can be used as a screening tool to look for antibodies that bind to specific 
proteins. This could be applied for the generation of monoclonal antibodies in a high- 
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throughput setting or in the context of measuring immune responses in a patient. ELISA 
techniques can be used for detection. 

Antigen Arrays. One class of such arrays is an array of antigens, displayed for 
the purpose of determining the specificity of antibodies in a subject. The array is 
programmed such that each address represents a different antigen of a pathogen or of a 
malady (e.g., antigens significant in allergies; transplant rejection and compatibility 
testing; and auto-immune disorders). 

In one embodiment, the array has antigens from a plurality of bacterial organisms. 
Computer programs can be optionally used to predict likely antigens encoded by the 
genome of an organism (Pizza et al (2000) Science 287:1816). In a preferred 
embodiment, each address has disposed thereon a unique antigen. In another preferred 
embodiment, each addresses has a plurality of antigens, all being from the same species. 
Thus, for example, binding of a subject's antibody to an address indicates that the subject 
has been exposed to a pathogen represented by the address. 

In another preferred embodiment, the array is used to track the progression of 
complex diseases. For example, diseases with antigenic variation (e.g., malaria, and 
trypanosomiasis) can be accurately diagnosed and/or monitored by identifying the 
repertoire of specific antibodies in a subject. 

In another embodiment, the array can be used to detect the specific target of an 
autoimmune antibody. For example, isolated antibodies or serum from a subject having 
type I diabetes are contacted to an array having islet-cell specific proteins present at 
different addresses of the array. 

Antigen arrays also provide a convenient means of monitoring vaccinations and 
disease exposure, e.g., in epidemiological studies, veterinary quarantine, and public 
health policy. 

Antibody Arrays. A second class of diagnostic arrays is arrays of antibodies. A 
variety of methods are available for identifying antibodies. Monoclonal antibodies 
against a variety of antigens are identified. The nucleic acids encoding such antibodies 
are sequenced from the genome of hybridoma cells. The nucleic acid sequence is used to 
engineer single-chain variants of the antibody. Thus, although the two domains of the Fv 
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fragment, VL and VH, are coded for by separate genes, they can be joined, using 
recombinant methods, by a synthetic linker that enables them to be made as a single 
protein chain in which the VL and VH regions pair to form monovalent molecules 
(known as single chain Fv (scFv); see e.g., Bird et al. (1988) Science 242:423-426; and 
Huston et al. (1988) Proc. Natl. Acad. Sci. USA 85:5879-5883). The encoding nucleic 
acid sequence can be recombined into an appropriate vector, e.g., a vector described 
above with promoter and affinity tag encoding sequences. 

In addition, the antibody sequence can be engineered to remove disulfides (Proba 
K (1998) J Mol. Biol. 275 :245-53). Alternatively, after translation and washing of the 
array, the array is subject to oxidizing conditions, e.g., by contacting with glutathione. 
The antibodies can be coupled to the array with streptococcal protein G, or S. aureus 
protein A. Further, specialized antibodies such as modified or CDR-grafted version of 
naturally occurring antibodies devoid of light chains can be used. The antibodies of 
camel (e.g., Camelus dromedaries) are naturally devoid of light chains (Hamers- 
Casterman C (1993) Nature 363:446-8; Desmyter et al. Nat Struct Biol 1996 
Sep;3(9):803-ll). 

A patient sample can then be contacted to the array. Non-limiting examples of 
patient samples include serum proteins, proteins extracted from a biopsy obtained from 
the patient, and so forth. In addition, cells themselves can be contacted to the array in 
order to query for antigens displayed on the cell surface. 

In one embodiment, the sample is modified with a compound prior to being 
contacted to the array. For example, the sample can be biotinylated. Addresses that bind 
proteins in the sample are then identified by contacting the array with labeled streptavidin 
or labeled avidin. In another embodiment, the sample is unlabelled. MALDI, SPR, or 
another techniques are used to identify if a protein is bound at each address. Arrays can 
be designed to identify proteins associated with various maladies, e.g., to detect antigens 
associated with cancer at various stages (for example, early, and pre-metastatic stages) or 
to provide a prediction (for example, to quantitate the abundance of an antigen correlated 
with a condition). 
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Vaccine Development 

The NAPPA arrays provide an improved method for developing a vaccine. One 
preferred embodiment includes identifying possible antigens for use in a vaccine from the 
sequenced genome of a pathogen. Pizza et al. (2000) Science 287:1816 describe routine 
computer-based methods for identifying ORFs which are potentially surface exposed or 
exported from a pathogenic bacteria. The method further includes making 1) a nucleic 
acid that serves as a DNA vaccine for expressing each candidate antigen, and 2) a nucleic 
acid encoding the ORF and an affinity tag in order to program an array. The 
recombination cloning methods described herein are amenable for generating such a 
N' collection of nucleic acids. 

The nucleic acids serving as a DNA vaccine can be assembled into multiple 
random pools and used to immunize a plurality of subjects, e.g., mice. Subsequently, 
each immunized subject is challenged with the pathogenic organism. Serum is collected 
from subjects with improved immunity. 

An array is provided with a unique encoding nucleic acid at each address. The 
p array is translated and then contacted with the serum from a subject with improved 

f y immunity. Binding of a serum antibody to an address are indicative of the address 

O having a polypeptide that is an antigen useful for vaccination against the pathogen. 

In another embodiment, a DNA vaccine is substituted with conventional inj ection 
of antigens, e.g., as described in Pizza et al, supra. 

Network for Diagnostic Assay 

Referring to Figure 4, a network links health care providers 50, subjects 42, and 
an intermediary server for the purpose of providing results of diagnostic NAPPA arrays. 
Health care providers can include a primary care physician 44; and a specialist physician 
48, e.g., infectious disease specialist, rheumatologist, hematologist, oncologist, and so 
forth; and pathologists 46. Within a health care institution, such providers can be linked 
by an internal network 50 attached to an external network 41 by a firewall 51. 
Alternatively, the providers can be located on different internal networks that can 
communicate, e.g., using secure and/or proprietary protocols. The external network can 
be the Internet or other well-distributed telecommunications network. 
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The subject can be a human patient, an animal, a forensics sample, or an 
environmental sample (e.g., from a waste system). 

A sample, e.g., of blood, cells, biopsy, serum, or bodily fluid, provided by the 
subject is delivered to the array diagnostic service, for example by a courier. Tracking 
provided by the courier system 64 can monitor delivery. The delivered sample is 
analyzed according to instructions, e.g., accompanying the sample, or provided across the 
network. The instructions can indicate suspected disorders and/or requested assays. 

The array is programmed such that after translation, each address will contain a 
different antigen or antibody (e.g., as described above). For common diagnostics, 
NAPPA arrays can be prepared in bulk at the same or another facility. 

The sample is optionally processed and then is contacted to a nucleic acid 
programmable array, e.g., before or after translation to the encoding nucleic acid. 
Sample handling and detection can be controlled automatically by the array diagnostic 
server 56 which is interfaced with robotic and detection equipment. The binding of the 
sample to the array is then detected by the array diagnostic server 56. Addresses wherein 
binding of the sample to the array is detected are recorded, e.g., in a table that is store in a 
database server 58. An intermediary server 54 is used to transmit results, e.g., securely, 
back to the health care providers, e.g., the primary care physicians 44, and the specialist 
48. Optionally, the patient or subject can be directly notified if results are available. 

The results can be stored in the database server 58 and/or transmitted to one or 
more of the physicians, and health-care providers. The results also may be made 
available e.g., for meta-analysis by public health authorities and epidemiologists. 



Informatics 

A computer system, containing a repository of observed interaction is also 
featured. The computer system can be networked to receive data, e.g., raw data or 
processed data, from a data acquisition apparatus, e.g., a microchip slide scanner, or a 

fluorescence microscope. 

The computer system includes a relational database. The database houses all data 
from multiple screens, e.g., using different arrays. One table contains table rows for each 
experiment, e.g., describing the microarray production number, experiment date, 
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